Non‐stimulated regions in early visual cortex encode the contents of conscious visual perception

Abstract Predictions shape our perception. The theory of predictive processing poses that our brains make sense of incoming sensory input by generating predictions, which are sent back from higher to lower levels of the processing hierarchy. These predictions are based on our internal model of the world and enable inferences about the hidden causes of the sensory input data. It has been proposed that conscious perception corresponds to the currently most probable internal model of the world. Accordingly, predictions influencing conscious perception should be fed back from higher to lower levels of the processing hierarchy. Here, we used functional magnetic resonance imaging and multivoxel pattern analysis to show that non‐stimulated regions of early visual areas contain information about the conscious perception of an ambiguous visual stimulus. These results indicate that early sensory cortices in the human brain receive predictive feedback signals that reflect the current contents of conscious perception.


| INTRODUCTION
Predictions play an important role in perception (de Lange, Heilbron, & Kok, 2018). According to the theory of predictive processing, our brains use an internal model of the world to make predictions that are fed back from higher to lower levels of the processing hierarchy, thereby enabling inferences about the hidden causes of the sensory input data (Friston, 2005;Rao & Ballard, 1999). This framework might provide the key to a neuroscientific account of conscious perceptual experiences, one of the greatest challenges for theories of human brain function. Within the framework of predictive processing, it has been proposed that conscious perception corresponds to the currently most probable internal model of the world, that is, the model that makes the best predictions about the incoming sensory data (Hohwy, Roepstorff, & Friston, 2008). From this conceptualization of conscious perception as reflecting a predictive model, it follows that predictions generated by this model should be fed back from higher to lower levels of the processing hierarchy.
In the current study, we investigated whether predictive feedback signals that reflect the current contents of conscious perception can be observed in non-stimulated regions of human early visual cortex.
Non-stimulated visual regions do not receive any bottom-up stimulation; therefore, any information in these regions must come from higher visual areas through feedback connections. This approach has successfully been used in several previous studies, showing for example that feedback signals contain information not only about which visual scene is presented (Smith & Muckli, 2010), but also about the spatial frequency of the scene (Revina, Petro, & Muckli, 2017). High-

| Stimuli
Plaid stimuli were created by superimposing two individual component square-wave gratings (van Kemenade, Seymour, Christophel, Rothkirch, & Sterzer, 2014). The stimuli were designed to be perceptually ambiguous, yielding bistable perception with spontaneous alternations between perception of either the two components moving in different directions ('component perception') or of one pattern moving in the average direction of the two gratings ('pattern perception').
The angle between the components could be 60 or 150 , but for both angles the average motion direction between the two gratings was horizontal, either leftward or rightward, resulting in four stimulus configurations (60 left, 60 right, 150 left, 150 right) that all elicited bistability between component and pattern perception (see Figure 1a). fMRI results were pooled across these four stimulus configurations, as they were not relevant to the purpose of the present study. The individual gratings had a spatial frequency of 0.5 cycles per degree of visual angle and a duty cycle of 0.3. The term 'duty cycle' refers to the proportion of the width of the darker bars within one cycle of the grating. The speed of the individual gratings was 1.3 cycles/s for the 60 stimuli, and 0.39 cycles/s for the 150 stimuli.
The speed of the resulting plaid stimuli was 1.5 cycles/s for all stimulus configurations. The plaid stimuli were presented within a centred annulus with a diameter of 13 of visual angle, and the upper right quadrant was occluded, that is, had the same luminance as the background ( Figure 1a). In the centre of the annulus, which had a diameter of 3 , a fixation cross was presented. The background surrounding the stimuli had a luminance of 40 cd/m 2 . The luminance of the gratings of the 150 stimuli was 14 cd/m 2 . For the 60 stimuli, the two component gratings differed in luminance: one grating had 2 cd/m 2 , the other 20 cd/m 2 . The luminance of the intersections of the gratings was determined in pilot experiments that aimed at approximate equiprobability of component and pattern perception for all stimulus types and resulted in an intersection luminance of 9 cd/m 2 for the 150 stimuli and 2 cd/m 2 for the 60 stimuli.

| Procedure
The stimuli were presented on a screen at the end of the MRI scanner bore. Participants laid in the scanner in supine position and viewed the stimuli on the screen through an angled mirror. They were asked to fixate on the central fixation cross and report their percept (pattern or component perception) by button presses. They had to report their percept as soon as the stimulus was presented, and press a button anytime their percept changed. A pattern percept was reported with the right index finger, and a component percept with the right middle finger. Each run comprised eight trials, lasting 60 s each, during which a plaid stimulus was continuously presented in one of the four stimulus configurations.
Each trial was followed by a 10 s fixation interval, during which only the fixation cross was presented. Each stimulus configuration was presented twice per run in pseudorandomised order. There were six runs in total. van KEMENADE ET AL. After the main experiment, two functional localisers were presented. The first was a stimulus localiser. Here, each stimulus from the main experiment was presented for 12 s, followed by fixation for 8 s, in a block-design. Different from the main experiment, participants were asked to fixate only and not report their perception. All conditions were presented four times in total. This functional stimulus localiser allowed for selection of voxels that were activated by the stimuli used in the main experiment. Furthermore, we used a functional localiser that mapped the non-stimulated region and was designed to preclude any spill-over of activity from the stimulated region, similar to the localiser of Smith and Muckli (2010). During this localiser, participants viewed contrast-reversing checkerboard stimuli (4 Hz), which were again presented for 12 s each, followed by 8 s of fixation. Each condition was repeated 8 times. The localiser contained 'surround stimuli', mapping the border between stimulated and nonstimulated regions, and 'target stimuli', mapping the non-stimulated region. The surround stimulus was presented at 0.5 of visual angle diagonally from the fixation cross, mapping the outer 1 of the nonstimulated quadrant. The checkerboard representing the nonstimulated quadrant, that is, the target stimulus, was presented at 1 diagonally from the surround stimulus (see Figure 1b). Thus, the target region, from which voxels were selected for our decoding analysis of the non-stimulated quadrant, was $2 away from the stimulated region. The scanning session ended with a structural T1 scan (MPRAGE). Standard phase-encoded retinotopic mapping was performed in a separate scanning session to define regions V1-3.

| Scanning parameters
Functional MRI data were acquired using a 3 T TIM Trio scanner (Siemens, Erlangen, Germany), equipped with a 12-channel head-coil.
A gradient echo EPI sequence was used (TR: 2 s, TE: 30 ms, flip angle: 80 , slice thickness: 2.3 mm, gap: 10%, voxel size 2.3 Â 2.3 Â 2.53 mm). A total of 280 volumes were acquired for each run of the main experiment, 163 volumes for the stimulus localiser, 163 volumes for the non-stimulated quadrant localiser, 123 volumes per run (3 in total) for the polar angle retinotopic mapping, and 102 volumes per run (3 in total) for eccentricity mapping, each containing 29 slices oriented parallel to the calcarine sulcus and acquired in ascending order. Anatomical images were obtained using an MPRAGE sequence (TR: 1.9 s, TE: 2.52 ms, flip angle: 9 ).

| Eye movements
Eye movements were recorded with an iView Xtm MRI-LR system [SensoMotoric Instruments (SMI), Teltow, Germany] using a sampling F I G U R E 1 Stimuli and main results. (a) Ambiguous moving plaid stimuli were presented in four different stimulus configurations, which differed in the angle between the two component gratings (60 or 150 ) and the overall motion direction of the resulting pattern (leftward or rightward). (b) The surround stimulus mapped the border between stimulated and non-stimulated regions, and the target stimulus mapped the nonstimulated quadrant (each presented in separate blocks, separated by fixation blocks). (c) Classifier accuracy discriminating component and pattern perception across all stimulus configurations for stimulated and nonstimulated regions of early retinotopic areas. Error bars represent 95% confidence interval (CI). *p < .05, **p < .01, ***p < .001 rate of 50 Hz. Due to technical difficulties, no usable eye tracking data were obtained for four participants, and for one run of a fifth participant. The eye tracking data were used in a control analysis to discard runs with poor fixation performance. To determine fixation performance, a radius of 1.5 from fixation was defined as the fixation area.
Eye movements beyond this area were considered as outliers. Data were detrended and mean-corrected to determine the number of these outliers, and runs in which eye movements extended beyond 1.5 of fixation in more than 5% of all data points were excluded. A total of 10 runs distributed across 5 participants were excluded in the control analysis based on eye tracking exclusion criteria.

| fMRI analysis
The fMRI data were preprocessed and analysed using SPM12. First, the functional images were realigned to correct for head motion, after which they were coregistered with the structural image obtained in the same session. Then, both functional and structural images were coregistered with the structural image obtained in the retinotopy session. No normalisation or smoothing was applied, as is common for studies using multi-voxel pattern analysis (MVPA). A general linear model (GLM) was set up in which each regressor modelled all trials belonging to a given stimulus configuration and percept, resulting in eight regressors of interest. Motion parameters as well as a regressor modelling fixation in between trials were included as regressors of no interest. If participants reported only one percept for a certain condition, the other percept of that condition could not be modelled in that run; therefore, such runs were excluded. This affected all runs from one participant, and another seven runs distributed across three participants.

| ROI definition
Regions of interest (ROIs) were defined with similar methods as those used by Smith and Muckli (2010). First, regions V1-V3 were defined using standard retinotopic mapping procedures. Within regions V1-V3, only the voxels that showed significant positive response to the stimulated region (t-contrast stimulus > fixation, p < .01 uncorr.) in our stimulus localiser were selected. For the non-stimulated region, the following procedure was used. First, we defined a region from the contrast non-stimulated target area > surround (p < .01 uncorr). Then, in order to ensure that these voxels were not responsive to the stimulated region, we further selected from this region only the voxels that met these criteria: significant positive response to the non-stimulated target area alone (t > 1.65, p < .01 uncorr.), no significant response to the stimulated area alone (t < 1.65, p > .01 uncorr.), and no significant response to the surround region (t < 1.65, p > .01 uncorr.).
The stimulated ROIs were naturally larger than the nonstimulated ROIs, as the stimulus spanned three quadrants compared to one occluded quadrant. Furthermore, our strict criteria for selecting non-stimulated voxels outlined above meant we only selected a small sample of the voxels corresponding to the occluded quadrant. To correct for potential biases induced by this difference in ROI size, we performed an additional control analysis with smaller stimulated ROIs that had the same number of voxels as their non-stimulated counterpart ROI. These ROIs were generated by manually selecting voxels corresponding to the stimulus quadrant immediately opposite the occluded quadrant, in our case the quadrant in the upper left visual field. As such, we selected voxels in the right hemisphere below the calcarine sulcus. From these voxels, we randomly selected n voxels, with n being the number of voxels of the non-stimulated ROI for that particular visual area (V1-V3) and participant. For two participants, not enough voxels were available in the respective stimulated quadrant of V1 to match the number of voxels from the non-stimulated V1 ROI. For these two participants, we therefore used all the voxels available in the stimulated quadrant and thus had slightly less voxels in stimulated V1 ROI compared to the non-stimulated V1 ROI (for one participant 12 stimulated voxels vs. 15 non-stimulated voxels, for the other participant 6 stimulated voxels vs. 24 non-stimulated voxels).
Data from a standard hMT+/V5 localiser were available for 10 of our subjects. Individual hMT+/V5 ROIs were defined by selecting voxels from the contrast moving dots > static dots (p < .001 uncorr.) whilst taking anatomical landmarks into account (Dumoulin, 2000).

| MVPA
MVPA was performed using The Decoding Toolbox (Hebart, Görgen, & Haynes, 2015), which implements the LibSVM software (http://www. csie.ntu.edu.tw/wcjlin/libsv). A linear support vector machine was trained to discriminate pattern from component percepts based on the beta images resulting from the GLM. As the GLM already included grand mean scaling of the data, no additional scaling was performed. The classification was performed for each stimulus configuration separately. Classifier performance was tested using a leave-one-run-out cross-validation approach. Training was carried out on all but one run, which served as the test data. This was repeated until all runs had served as a test run once. The decoding accuracy was averaged across cross-validations and then across conditions. Permutation testing was conducted to determine the significance at the group level as described by Stelzer, Chen, and Turner (2013). In brief, we provided the classifier with all possible combinations of shuffled label assignments for each participant and performed the decoding procedure for each label assignment. Then, we randomly selected one of these decoding accuracies from each participant and calculated the mean decoding accuracy. This procedure of random selection and calculation of mean decoding accuracy was repeated 10,000 to generate a distribution of decoding accuracies. We then used a cut-off of 95% to determine significance of our results.

| Univariate analysis
In order to further understand the neural mechanisms involved, we additionally performed a univariate analysis contrasting component van KEMENADE ET AL. with pattern percepts and vice versa. To this end, we used the same native-space data used for our MVPA analysis, with the same GLM.
We extracted the beta values for the contrasts patterns > baseline and components > baseline from the respective native-space ROIs for each subject. We then performed repeated-measures ANOVAs on these beta values with the factors Region (stimulated vs. non-stimulated) and Percept (patterns vs. components). As in the multivariate approach, we first analysed the ROIs comprising V1-V3, and then analysed each region separately.
We performed the same analysis on our hMT+/V5 ROIs, where we expected to see more activity for components than patterns, as shown by previous studies (Castelo-Branco et al., 2002;Grassi et al., 2018).

| Phase durations
The mean perceptual phase duration of the 60 stimuli (averaged across leftward and rightward moving stimuli) was 7.4 s for components (SD = 8.6) and 9.9 s for patterns (SD = 4.6). For the 150 stimuli, mean phase duration for components was 8.2 s (SD = 7.5) and for patterns 4.9 s (SD = 1.7).

| Control analysis correcting for the difference in number of voxels between stimulated and nonstimulated ROIs
In this analysis, we decoded from stimulated and non-stimulated ROIs that were matched in size. As displayed in Figure 2b Our ROI analysis on area hMT+/V5 showed significantly more activity for components than patterns [t(9) = À2.33, p = .045, see Figure 3].

| DISCUSSION
Our findings show that the current perceptual state during bistability can be decoded from fMRI signal patterns not only in stimulated early visual regions, which is in line with previous studies (Haynes & Rees, 2005), but crucially also in non-stimulated retinotopic visual cortex, which did not receive any bottom-up input. This suggests that non-stimulated regions of early visual cortex contain information not only about visual stimulation in the surrounding context, as previously shown (Smith & Muckli, 2010), but even about conscious perception independent of visual stimulation per se. This is in line with current theories that model bistable perception within the framework of predictive processing (Brascamp, Sterzer, Blake, & Knapen, 2018;Hohwy et al., 2008). According to this view, ambiguous stimuli (such as the bistable moving plaids used here) provide equally strong sensory evidence for two different percepts, but the currently dominant percept establishes an implicit prediction regarding the cause of the sensory input. This prediction is thought to stabilize the current perceptual state through feedback from higher to lower hierarchical levels, while sensory evidence for the currently suppressed perceptual interpretation elicits prediction errors that act to destabilize the current percept, eventually leading to a perceptual change (Weilnhammer et al., 2021;Weilnhammer, Stuke, Hesselmann, Sterzer, & Schmack, 2017). Here, we provide evidence supporting the notion of feedback signalling of predictions in bistable perception.
There have been other studies that showed neural activity in visual areas that were not directly stimulated. These include studies on object perception (Williams et al., 2008), feature-based attention (Serences & Boynton, 2007), visual scene perception (Smith & Muckli, 2010), and illusions like the Kanizsa triangle (Kok, Bains, van Mourik, Norris, & de Lange, 2016), apparent motion (Chong, Familiar, & Shim, 2016;Muckli, Kohler, Kriegeskorte, & Singer, 2005), or the bistable Gestalt illusion (Grassi, Zaretskaya, & Bartels, 2017). Our study is in line with this earlier work, which underlines the idea that long-range connections carry feedback signals from higher areas back to early visual cortex. However, it is distinct from these findings in the key aspect that it shows that such feedback signals in non-stimulated visual areas carry information about the subjective interpretation of an ambiguous stimulus, where the physical properties of the stimulus are stable, while the conscious perception of the participant alternates between two alternative interpretations. Bistable motion quartets inducing apparent motion also show activity along the non-stimulated motion path depending on conscious interpretation, but this activity underlies the reconstruction of an illusory percept, that is, of a stimulus that is not actually there. In our study, the activity reflected feedback signals about a stimulus that was always physically present, but was interpreted in different ways over time. As such, our results do not only support the general idea that predictions are sent back to early visual cortex, but importantly that they are involved in the subjective interpretation of an ambiguous stimulus.
Our univariate results showed significantly more activation for patterns than components in non-stimulated early visual areas.
Increased activation for patterns in early visual cortex has been reported in previous studies as well (Grassi et al., 2018;Wilbertz, Ketkar, Guggenmos, & Sterzer, 2018 , 2002), and that it sends information back to F I G U R E 3 Results of univariate analysis. Beta values are displayed for patterns and components in each ROI. Early visual areas generally showed increased activity for patterns compared to components in non-stimulated areas. In contrast, we observed more activity for components than patterns in area hMT+/V5. Significance labels are added for post-hoc t-tests (*p < .05, **p < .01, ***p < .001, n.s. = not significant). Since the Region x Percept interaction did not reach significance in V1, no post-hoc t-tests were performed for this region, but the results point in the same direction as the other early visual regions. Error bars represent 95% confidence interval (CI) early visual cortex during this process (Duarte et al., 2017). Furthermore, effective connectivity analyses have shown that apparent motion induced activation of non-stimulated visual regions along the illusory apparent motion path is associated with enhanced feedback signalling from area hMT+/V5 (Sterzer, Haynes, & Rees, 2006), which has been shown to be causally involved in such apparent motion perception in a later TMS study (Vetter, Grosbras, & Muckli, 2015). Considering these studies, it seems plausible that area hMT+/V5 is also involved in predictive feedback signalling to non-stimulated areas during bistable plaid motion perception, and that our results thus reflect predictive feedback signalling coming from this area. Our significant decoding results in hMT+/V5 support the idea that this area generates the predictions that are sent back to early visual areas during bistable perception, though future studies will have to provide direct causal evidence. There are other potential origins of feedback signalling in bis-  (Brascamp et al., 2018;Grassi et al., 2018;Weilnhammer et al., 2021). Recent evidence suggests that hMT+/V5 might signal perceptual conflict to and receive signals from frontal areas to resolve this conflict, making hMT+/V5 a hub for receiving and relaying feedback signals from and to frontal cortex (Weilnhammer et al., 2021). As our study was focused on visual cortex, we were unable to verify the involvement of areas outside visual cortex. However, our results support the idea of hMT+/V5 as a source of feedback signals to early visual cortex in bistable perception.
In conclusion, our current results provide compelling support for the notion that conscious perception reflects an internal model that generates predictions about the current state of the world, and that these predictions are fed back to the lowest levels of sensory processing to enable inferences regarding the sensory input.