How Do mammals Mentally Keep Track of Where Things are in the Visual Scene?
In practice it is hard to get an artificial system to reliably locate/identify objects in a 2-dimensional image space. Now consider a real mammal that navigates around in space with a constantly moving visual scene and the problem becomes very difficult! Many theories on how mammals "visually" identify an objects location in space have focused on the "inferring" position or relative motion based on the visual scene alone (i.e. only the information that reaches the retina is considered). Remember the visual scene sensed by the eye is only 2-dimensional because the retina itself is 2-dimensional; however, the scene is in fact 3-dimensional and both the eye and the scene are constantly moving. There are a host of monocular 3-dimensional visual cues which are purely visual sensory like contrast and 3-D structure-from-motion. Below are a series of figures that show a part of the primate brain (parietal cortex) that uses both retinal and motor (eye position) cues to represent spatial location. This is an alternative way to represent a 3-dimensional space. Neural activity rates were monitored in parietal area 7a while the subject viewed a computer screen with various patterns of motion. We track the eye position and have subjects move their eyes to 9 different locations. Using this procedure we find that eye position changes the neuronal response to the visual stimuli. Both eye position and object position are encoded across the population of neurons within parietal cortex. Furthermore, we find that a third cue (motion pattern) converges on this population of neurons. The computational outcome of the convergence between retinotopic and eye position cues is a mental representation of 3-dimensional space (model below). The representation of motion pattern by parietal neurons (Fig. 3) and its interactions with eye position suggest that parietal neurons may integrate categorical representations of motion/object shape with spatial location of moving objects.
Our findings (and similar work by Anderson, Georgopolus, Gnat, Graziano, Colby, Goodale) suggest that the brain disambiguates object from self motion largely by representing information about the visual scene in "body centered" coordinate systems. In real life an animals eyes are always moving hence the 2-dimensional map of object location (in cartesian coordinates) is somewhat ambiguous i.e. how does the brain disambiguate object motion from eye (or body) motion?
According to the model outlined below, output from parietal cortical neurons can be used to identify where an object is relative to the viewer.
|
Figure 1: illustrates a series of neural responses elicited by different patterns of motion. This neuron fired (black bars) a lot when the subject viewed expansion motion. Note how very little firing was induced by rotation, compression or spiralling patterns of motion (icons indicate motion pattern). Figure 2: Another neuron fires when the subject views left/rightward rotation motion but not radial motion (black bars). The neuron has the same pattern selectivity when the display is shifted 5mm to a new retinotopic position(white bars). Thus, the pattern selectivity is position invariant (in cartesian space). Figure 3: Parietal cells also show a preferred eye position for viewing a moving stimulus. Thus, the neural response to a visual stimulus changes as a function of angle of gaze (i.e. the eye position).
Figure 1

Figure 2

Angle of Gaze Gain Fields in Area 7a Change with Type of Visual
Stimulus Viewed (Read and Siegel, 1997)
| The optic flow pattern is position invariant as described above and yet the Gaze Gain Fields change as a function of visual stimulus properties. This suggest several things. If the Gain Field had simply increased or decreased in magnitude with the change in visual stimulus properties then one could suggest that Optic flow or moving visual patterns may be intrinsically more "sa. for optic flow selectivity is result is of interest because is suggest that "shape", "vector flow" or "form" cues are represented in area 7a.
Figure 3
A) retinotopic receptive field for a square of light
B) AOG Gain Field for a square of light
C) AOG Gain Field for pattern of rotating points of light

Figure 7 from Siegel and Read, 1997: Simple model for constructing a spatial representation. (A) Physiological optics define how the eye position and the location of the target in space interact to yield the location of the target on the retina. A fixation point is located up 10o on the vertical meridian (). A stimulus is located 15o below the animal. The retinal image of the target is thus 25o from the fovea. (B) The firing rate of the modeled neuron is plotted as a function of the retinotopic location of a stimulus for different eye positions. Note that the receptive field remains centered on the retina while the amplitude is linearly modulated by eye position, i.e. as the eye position is moved up, the peak of the retinotopic response scales linearly. This is the basis of the gain field. (C) The firing rate is plotted as a function of the eye position for different targets in head-centered coordinates. Note that as the target's location is varied, the center of the response in eye coordinates changes, as does the strength of the response. (D) Plot of the integrated activity of a single area 7a neuron. The response of this neuron varies linearly with the location of the target in head-centered coordinates. The constants for Equation 5 and panels B and C were Hz/deg, Hz, , . The constants for Equation 6 and panel D were determined by trapezoidal integration of equation 5: Hz, Hz-deg.
from Siegel and Read (1997), Cerebral Cortex

|
Model of Gain Effect (Siegel and Read, 1997)
|