Vision Research by Edmund Rolls

Oxford Centre for Computational Neuroscience Professor Edmund T. Rolls Discoveries on the Neuroscience of Vision
	Overview: Rolls and colleagues discovered face-selective neurons in the inferior temporal visual cortex; showed how these and object-selective neurons have translation, size, contrast, spatial feequency and in some cases even view invariance; and showed how neurons encode information using mainly sparse distributed firing rate encoding. These neurophysiological investigations are complemented by one of the few biologically plausible models of how face and object recognition are implemented in the brain, VisNet. These discoveries are complemented by investigation of the visual processing streams in the human brain using effective connectivity. Key descriptions are in 639, B16, 656, 682 and 508. The discovery of face-selective neurons (in the amygdala (38, 91, 97), inferior temporal visual cortex (38A, 73, 91, 96, 162), and orbitofrontal cortex (397)) (see 412, 451, 501, B11, B12, B16). Our discovery of face selective neurons in the inferior temporal visual cortex was published in 1979 (Perrett, Rolls and Caan 38A), and in the amygdala was published in 1979 (Sanghera, Rolls and Roper-Hall 38), with follow-up papers in 1982 (Perrett, Rolls and Caan, 73), 1984 (Rolls 91), and for the amygdala in 1985 (Leonard, Rolls, Wilson and Baylis 97). These discoveries were followed up and confirmed (Desimone, Albright, Gross and Bruce 1984 J Neuroscience). The discovery of face neurons is described by Rolls (2011 501). The discovery of face expression selective neurons in the cortex in the superior temporal sulcus (114, 126) and orbitofrontal cortex (397). Reduced connectivity in this system has been identified in autism (541, 609). The discovery that visual neurons in the inferior temporal visual cortex implement translation, view, size, lighting, and spatial frequency invariant representations of faces and objects (91, 108, 127, 191, 248, B12, B16). The effective connectivity of the human prefrontal cortex using the HCP-MMP human brain atlas has identified different systems involved in visual working memory (660). In natural scenes, the receptive fields of inferior temporal cortex neurons shrink to approximately the size of objects, revealing a mechanism that simplifies object recognition (320, 516, B12, B16). Top-down attentional control of visual processing by inferior temporal cortex neurons in complex natural scenes (445). The discovery that in natural scenes, inferior temporal visual cortex neurons encode information about the locations of objects relative to the fovea, thus encoding information useful in scene representations (395, 455, 516). The discovery that the inferior temporal visual cortex encodes information about the identity of objects, but not about their reward value, as shown by reversal and devaluation investigations (32, 320, B11). This provides a foundation for a key principle in primates including humans that the reward value and emotional valence of visual stimuli are represented in the orbitofrontal cortex as shown by one-trial reversal learning and devaluation investigations (79, 212, 216) (and to some extent in the amygdala 38, 383, B11), whereas before that in visual cortical areas, the representations are about objects and stimuli independently of value (B11, B13, B14, B16). This provides for the separation of emotion from perception (B11, B16, 674). The discovery that information is encoded using a sparse distributed graded representation with independent information encoded by neurons (at least up to tens) (172, 196, 204, 225, 227, 321, 255, 419, 474, 508, 553, 561, B12, B16). (These discoveries argue against ‘grandmother cells’.) The representation is decodable by neuronally plausible dot product decoding, and is thus suitable for associative computations performed in the brain (231, B12). Quantitatively relatively little information is encoded and transmitted by stimulus-dependent ('noise') cross-correlations between neurons (265, 329, 348, 351, 369, 517). Much of the information is available from the firing rates very rapidly, in 20-50 ms (193, 197, 257, 407). All these discoveries are important in our understanding of computation and information transmission in the brain (B12, B16). A biologically plausible theory and model of invariant visual object recognition in the ventral visual system closely related to empirical discoveries (162, 179, 192, 226, 245, 275, 277, 280, 283, 290, 304, 312, 396, 406, 414, 446, 455, 473, 485, 516, 535, 536, 554, B12, 589, 639, B16). This approach is unsupervised, uses slow learning to capture invariances using the statistics of the natural environment, uses only local synaptic learning rules, and is therefore biologically plausible in contrast to deep learning approaches with which it is compared (639, B16). A theory and model of coordinate transforms in the dorsal visual system using a combination of gain modulation and slow or trace rule competitive learning. The theory starts with retinal position inputs gain modulated by eye position to produce a head centred representation, followed by gain modulation by head direction, followed by gain modulation by place, to produce an allocentric representation in spatial view coordinates useful for the idiothetic update of hippocampal spatial view cells (612). These coordinate transforms are used for self-motion update in the theory of navigation using hippocampal spatial view cells (633, 662, B16). The effective connectivity of the human visual cortical streams using the HCP-MMP human brain atlas has identified different streams (656, B16, 682, 676). A Ventrolateral Visual ‘What’ Stream for object and face recognition projects hierarchically to the inferior temporal visual cortex which projects to the orbitofrontal cortex for reward value and emotion, and to the hippocampal memory system. A Ventromedial Visual ‘Where’ Stream for scene representations connects to the parahippocampal gyrus and hippocampus. This is a new conceptualization of 'Where' processing for the hippocampal system, which revolutionizes our understanding of hippocampal function in primates including humans in episodic memory and navigation (682, 662). A Dorsal Visual Stream connects via V2 and V3A to MT+ Complex regions (including MT and MST), which connect to intraparietal regions (including LIP, VIP and MIP) involved in visual motion and actions in space. It performs coordinate transforms for idiothetic update of Ventromedial Stream scene representations (682, 662). An Inferior bank STS (superior temporal sulcus) cortex Semantic Stream receives from the Ventrolateral Visual Stream, from visual inferior parietal PGi (655), and from the ventromedial-prefrontal cortex reward system (649) and connects to language systems (682, 654). A Superior bank STS cortex Semantic Stream receives visual inputs from the Inferior STS Visual Stream, PGi, and STV, and auditory inputs from A5, is activated by face expression, motion and vocalization, and is important in social behaviour, and connects to language systems (656, B16, 682, 654). Binaural sound recording to allow 3-dimensional sound localization (11A, UK provisional patent, Binaural sound recording, B16).