Depth Perception with Interocular Blur Differences based on a Spiking Network

Visual depth perception is the basic function of the visual nervous system. To a stimulus in the stereo space, the visual nervous system could generate a perception about depth of its position. Experimental data have demonstrated that interocular blur differences could lead to illusory perceptions about depths of moving stimuli. However, to stimuli with interocular blur differences, influences of different factors on illusory depth perceptions are still unclear. To explore these influences, this paper constructs a plastic two-layer k-winner-take-all (k-WTA) spiking network, simulating primary visual cortical responses. With incompatible stimuli presented into two eyes in experiments, binocular rivalry could occur in the primary visual cortex and interact with depth perception. To simulate binocular rivalry, the network consists of two parallel visual channels driven by left-eye and right-eye stimuli and competing with each other through mutual inhibition. In simulations, the horizontally moving stimulus is filtered with different Gaussian filters to generate paired monocular stimuli with interocular blur differences. The blurry strength, the moving direction and the moving speed perform as varying factors of moving stimuli. The network updates its dynamics through probabilistic inference, reflecting impacts of each factor on both neural responses and binocular rivalry. The modified responses could simulate illusory depth perceptions of stimuli as observed in experiments. To stimuli with interocular blur differences, varying factors could modify binocular rivalry in the network, inducing distinguishing illusory depth perceptions. Based on probabilistic inference, our model could provide possible explanations to illusory depth perceptions with interocular blur differences.


I. INTRODUCTION
Visual motion perception is an important function of our brains and may be influenced by many possible factors, such as the saccadic eye movements, contexts, contrasts, the spatiotemporal frequency and so on [1]- [8]. To a stimulus in the three-dimension space, the visual nervous system could generate a perception about the depth of its stereo location.
Particularly, interocular blur differences caused by a prescription lens correction have been found to induce the illusory perception about the depth of the moving stimulus [3]. These illusions may impact public safety. However, the influence of different factors in the depth misperception induced by interocular blur differences are still unclear. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3142044, IEEE Access VOLUME XX, 2017 1 cortex (V1) during visual perception [9]- [12]. Two eyes receive monocular images and transform visual information into sensory signals [13]- [15]. Starting from V1, these monocular signals rival and fuse into a stable view of the world with depth [16], [17]. Binocular visual mechanisms perform as the basic background in the exploration of stereo visual perceptions [18]- [20]. With incompatible stimuli presented to two eyes, binocular rivalry could occur in the primary visual cortex. For depth perception, previous experimental studies have demonstrated that binocular rivalry could coexist and interact with it [21]- [23]. Through the dichoptic view of random-dot patterns with various noisy contrasts, experimental results have found that rivalry and stereopsis occur parallelly and independently [21]. To visual plaid patterns in experiments, depth perception and rivalry could coexist in different spatial frequency and orientation bands [22]. Stereoscopic depth perception was evident even when incompatible monocular images engage in binocular rivalry, demonstrating binocular rivalry and stereopsis could coexist [23]. To explain illusory depth perceptions induced by interocular blur differences, binocular vision should be the important biological background and binocular rivalry might be the possible mechanism.
Neural responses in the visual cortex are associated with various visual functions, including the contour matching, contour detection, the associative processing, the orientation selectivity [24]- [28]. For the primary visual cortex, the biological background of binocular rivalry has been modeled by spiking networks [9], [17], [29]. Probabilistic inference has been used to explore the principle of neural coding, such as the phenomenon of visual perceptions [30]- [32]. In the probabilistic framework, the winner-take-all spiking networks could simulate the primary visual cortical activities [33]- [35]. With neural spikes, the learning rules of connective weights in these networks take the forms of the Hebbian spike-timing-dependent plasticity (STDP) [36].
Besides, the temporal structure of neural spiking trains has been proved conducive to the information transformation, compared to the rate-based codes [37]. Particularly, the kwinner-take-all (k-WTA) network could simulate the simultaneous spiking responses of multiple neurons which is the general experimental phenomenon [33].
In the exploration of primary visual cortical mechanisms, the winner-take-all networks are designed in the two-layer structures with simplification [33]- [35]. The first-layer neurons transform the visual images into spiking trains to the second-layer network. The second-layer network is designed following the ubiquitous cortical microcircuits in layers 2/3, consisting of pyramidal cells with lateral inhibition [38]. The second-layer neurons perform as decoders to outside images.
With plastic connective weights following Hebbian STDP, the WTA network could simulate primary visual cortical sparse responses through probabilistic inference and decode the visual stimuli. Our previous work has explored how different eye movements induce the neural responding variability with the k-WTA spiking network [39]. Yet, these

A. THE STRUCTURE OF THE K-WTA NETWORK
In this section, a two-layer k-WTA spiking network is constructed as shown in Fig. 1 connective probabilities as those in experiments and the previous network about the primary visual cortex [33], [38].
To build the network consistent with experimental observations, connections among neurons are designed randomly with these connective probabilities.
The first-layer network contains two populations of excitatory afferent neurons which can receive monocular visual information. Two populations consist of L N and R N afferent modeling neurons, respectively. The subscripts L and R stand for the left and right eyes. The received field of each afferent neuron is simulated as the Difference-of-Gaussian filter. Each group of afferent neurons would transfer the received visual information through Poisson spikes towards the corresponding second-layer excitatory monocular neurons. The received fields are located within the image on a grid to make the image covered, as introduced in the previous study [40].   In the network, the temporal membrane potential of the This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  z is set to be 0 [33]. The temporal membrane potential and the active state of each right-eye monocular excitatory neuron can be given similarly.

The
K binocular excitatory neurons receive lateral inputs from two groups of monocular neurons, as well as another group of J binocular inhibitory neurons. These binocular inhibitory neurons are interconnected with each other and receive lateral excitatory inputs from each other. The temporal membrane potential of the k th binocular neuron is expressed as the sum of its received inputs: , an inhibitory neuron is randomly decided to have a connection towards each binocular excitatory neuron or not [33]. Through the random decision of connections, the structure of the network could be consistent with circuits in the primary visual cortex observed in experiments [38]. EI tj vy is the instantaneous lateral IPSPs from the j th binocular inhibitory neuron among J . The  − [34]. In the sampling of the hidden active state t h , the secondlayer excitatory and inhibitory neural spikes are generated according to (3)(4)(5)(6)(7)(8), which are similar to our previous study [39]. With the temporal second-layer excitatory neural spikes, the network generates the temporal action     The re-expression in Equation (11) is similar to the method in [42]. It means that, for a sequence of dynamics ; , , , ; log ; (14) log ; .
In order to make these learning rules more biologically plausible, this paper makes the approximations similarly to a previous study [43]. In this approximation, the modification of each weight is assumed to depend on its current value, the pre-and postsynaptic responses and the temporal reward. For instance, the partial derivative respects to the feedforward weight LL kn w can be approximated as: ,, , , , ; , ; , ; ;; Based on the energy distance in (25), the identification of the set B includes two steps. In the first step, with a given significance level  , a test for equal distributions of two sample sets A and B can be implemented by nonparametric resamples [45]. Then,   Besides, at the end of each simulation, all the clustering sets will be compared with each other for merging. The two clustering sets will be merged into one set if the likelihood exceeds a predefined value max p , which is similar to a previous study [44]. To merge two clustering sets, the size of the merged set is designed as the larger one between the sizes of two clustering sets. Then, a raw merged set is generated through the random arrangement and combination of all the components of two clustering sets.
With the size given, the merged set is derived from the raw merged set through the systematic sampling. After that, this merged set performs as the clustering set of one stimulus while the clustering set of the other stimulus is set to be blank. If a stimulus has the blank clustering set at the beginning of a simulation, the first temporal obtained sets of all the images will be merged into its clustering set. In this paper, the visual stimuli are images as used in [3].  Similarly, the right-eye temporary reconstruction of the given point ( ) 12 i ,i is given as: This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
focal focal z 1 2 1 2 For a point ( ) 12 i ,i , the relationship between ( ) In our simulations, the stimulus moves in the blank environment. In the grayscale image at each timestep, only the pixels corresponding to the stimulus are non-zero.
Following the assumption in [48], the temporal depth error

A. BINOCULAR RIVALRY AND RESPONDING MODULATION INDUCED BY MOVING STIMULI
In this section, the spiking network receives the opposite moving traces of a same moving stimulus. The network is trained and tested to simulate binocular rivalry to different moving stimuli.
The moving stimulus is that used in [3]. To explore how the interocular blur differences affect depth perceptions, the temporal depth perceptions for each binocular stimulus is averaged over time and simulations according to (34). As shown in Fig. 3

C. MOVING DIRECTIONS AFFECT DEPTH PERCEPTION OF MOVING STIMULI
In this section, it is explored how the moving direction affects depth perception from our network. The moving bar has two opposite directions, from left to right or from right to left. The bar moves with the common moving speed of 1 pixel/timestep, which is shown in Fig. 4

C. MOVING SPEEDS AFFECT DEPTH PERCEPTION OF MOVING STIMULI
In this section, it is explored how the moving speed affects depth perceptions from our network. For the stimulus with the interocular blur difference, the left-eye bar is designed to be blurry while the right-eye bar is sharp, which is shown in Fig. 5