Direct and indirect viral associations predict coexistence in wild plant virus communities

23 Integration of community ecology with disease biology is viewed as a promising avenue for 24 uncovering determinants of pathogen diversity, and for predicting disease risks. Plant- 25 infecting viruses represent a vastly underestimated component of biodiversity with 26 potentially important ecological and evolutionary roles. We performed hierarchal spatial 27 analysis of wild plant populations to characterise the diversity and coexistence structure of 28 within-host virus communities, and their predictors. Our results show that these virus 29 communities are characterised by single infections of few, dominating virus taxa as well as 30 diverse, non-random coinfections. Using a novel graphical modelling framework we 31 demonstrate that after accounting for environmental heterogeneity at the level of both 32 individual host plants and populations, most virus co-occurrence patterns can be attributed to 33 virus-virus associations. Moreover, we show that conditioning variables changed virus 34 association networks especially through their indirect effects. This highlights a previously 35 underestimated mechanism of how human-driven environmental change can influence 36 disease risks. 37


Abstract 23
Integration of community ecology with disease biology is viewed as a promising avenue for 24 uncovering determinants of pathogen diversity, and for predicting disease risks. Plant-25 infecting viruses represent a vastly underestimated component of biodiversity with 26 potentially important ecological and evolutionary roles. We performed hierarchal spatial 27 analysis of wild plant populations to characterise the diversity and coexistence structure of 28 within-host virus communities, and their predictors. Our results show that these virus 29 communities are characterised by single infections of few, dominating virus taxa as well as 30 diverse, non-random coinfections. Using a novel graphical modelling framework we 31 demonstrate that after accounting for environmental heterogeneity at the level of both 32 individual host plants and populations, most virus co-occurrence patterns can be attributed to 33

Introduction 45
Viruses, like all organisms, occur as communities with other species 1 . Importantly, the 46 diversity of co-occurring viruses may have profound impact on virus evolution and 47 epidemiology 2 . Hence, understanding the conditions and scales at which virus communities 48 vary is critical when predicting how virus communities respond to environmental change or 49 to disease control measures 3-6 . Over the past decade metagenomic surveys exploring virus 50 diversity in wild hosts ranging from plants to insects and mammals 7-11 have revealed a 51 tremendous, largely undescribed, virus diversity across environments. It is becoming clear 52 that the diversity of virus taxa even within the same host can be highly variable, and that viral 53 co-occurrences are common in nature 1,2,12,13 . 54 Predicting how pathogen communities respond to environmental change requires a 55 principled community ecology framework to disentangle possible drivers of coexistence 56 patterns 14,15 . As Popovic et al. 16 recently outlined, there are three main drivers of observed 57 species co-occurrence patterns ( Figure 1). First, two species might share similar responses to 58 environmental variables such as temperature [17][18][19] . Second, two species may exhibit similar 59 responses to the occurrence of a third species: for example, some viruses require another 60 virus species to mediate their replication within a host 20,21 , resulting in an indirect association 61 between the dependent species. Third, two species might exhibit a direct, biotic association: a 62 is that their immediate environment is a living organism in itself, and hence the influence of 73 the abiotic environment (e.g. weather) on pathogens can be either direct or mediated by the 74 host and/or vectors ( Figure 1). Moreover, the distinction between abiotic and biotic effects 75 becomes blurred, as the interaction between a host and a pathogen can be considered both an 76 environmental effect as well as a biotic association (Figure 1). Pathogens can only occur 77 where they have susceptible hosts and thus spatial variation in resistance is expected to have 78 direct impacts on pathogen (co-)occurrence patterns [32][33][34] . 79 Ecological knowledge often relies on observational methods, but inferring signals of biotic 80 interactions from co-occurrence data is challenging 35 . However, studying species associations 81 through the perspective of conditional probabilities helps to overcome some of these 82 challenges. Conditional probability refers to the probability that two species will be found 83 together after controlling for the other species in the network. Implementations for ecological 84

Spatial variables 160
We included spatial variables implemented as Moran's eigenvector maps (MEMs) 51 . We

Descriptive analyses 170
We began with descriptive analyses and illustration of the virus co-occurrence structure. We  We quantified the relationship between the cumulative virus richness and co-occurrence 178 patterns with respect to increasing area sampled by calculating the mean species-area-curve 179 and mean coexistence-curve 55 . Both curves were constructed by randomly selecting one 180 sampled host plant, and then increasing the spatial scale and sample by including the next 181 closest plant to the species richness or co-occurrence calculation. We calculated the mean 182 curves by repeating this 100 times, selecting a different initial plant for each round. We also 183 calculated the maximum number of co-occurring virus pairs from the number of taxa conditional not only on the rest of the virus community, but also on the covariates included in 202 the model (Table 1). 203 The modelling framework is described in detail by Clark et al. 36 . Briefly, we modelled the log-204 odds of detecting virus i given covariate x and occurrence of virus j with 205 where ' ! is the vector of presences and absences of virus i; ' \! denotes the presences and 207 absences of all other viruses except i; 1 !# is the virus-level intercept; and 3 ! $ is the effect of 208 covariate , on the occurrence probability of virus i. Parameters 1 !%# and 3 !% $ represent the 209 associations between viruses, conditional on the occurrences of all the non-focal viruses 210 (other than the focal virus i). 211 We used data on viruses with at least 10 occurrences (16 viruses) in the entire virus 212 community matrix (i.e. minimum prevalence of 2.5% of sampling units). To understand how 213 environmental characteristics and the host affect the virus community, we included several 214 explanatory, conditioning variables in the model (Table 1) Table 1 for full details. Out of the 400 sampled host plants, four samples were discarded due to missing explanatory 264 variable data. Thus, we had 396 sampled plants, resulting in 9 900 unique virus observations. 265 The prevalence of the individual viruses varied from 0.5 to 36% in the whole data. Of the 396 266 plants, 29% were uninfected, 32% hosted a single infection, and 39% of the plants hosted 267 multiple infections, of which 7% consisted of five or more viruses (Figure 2A). 268

Descriptive analyses 269
The virus communities exhibit a significantly nested structure: the mean C-score over the  Table S1. 300 To understand the changes in the network resulting from the addition of conditioning 301 variables, we compared the virus-virus-associations between viruses based on the MRF and 302 the different CRF variants. The MRF revealed mostly positive associations between the 303 viruses ( Figure 4A). After including spatial, habitat and host-related variables (Table 1) Figure 4B). We will refer to these associations as 'permanent'. In this network, 318 Bromoviridae and especially Secoviridae appeared as hubs, with five association links to 319 other viruses. These permanent associations represented direct interactions between viruses 320 that could not be explained with indirect effects of the rest of the virus community nor any 321 combination of additional conditioning variables. 322 Next, to understand how host-and habitat-related variables and spatial configuration of the 323 hosts influence virus community structure, we investigated the direct effects of the additional 324 conditioning variables. All the significant direct effects of the environmental and spatial 325 variables were for either Caulimoviridae or Geminiviridae (Table 2): e.g., increasing 326 agricultural land use in the surrounding landscape increased the occurrence probability of 327 Caulimoviridae, and host population size predicted higher occurrence probability for 328 Geminiviridae.  There were altogether eight significant herbivory-related effect, all of them positive (Table 2). these mechanisms lead to more stable coexistence at increasing spatial scales 55,67 . 367 We found that the conditional network models (CRFs), outperformed the model incorporating 368 only associations between viruses (MRF) in explaining virus community structure. The 369 differences between the CRF model variants were less pronounced, as seen from their equal 370 model fit and performance. Hence, we conclude that there is environmental variation and 371 processes operating at different spatial scales that influence the wild virus community 372 coexistence structure both directly and indirectly. The environmental characteristics and the 373 spatial variables explain community structure almost interchangeably. Previously, e.g. within-374 host diversity of pathogens has been shown to increase with latitude, while pathogen 375 turnover follows an opposite trajectory, suggesting limited transmission in lower latitudes 68 . 376 Our analysis revealed spatial variables to influence associations between viruses, resulting in 377 indirect effects on their co-occurrence. These viruses are vector-dispersed, and most likely 378 dispersal-limited at larger spatial scales 69 , leading to less homogenisation at these scales. 379 However, at the scale of a host populations, variation among host genotypes 70 and 380 demographic stochasticity (through e.g. herbivory, as seen from the indirect effects of 381 herbivore damage on the associations between viruses) are expected to be important, as well 382 as abiotic interactions mediating coexistence 71 . 383 After accounting for the effects of the host and habitat characteristics, as well as spatial 384 structure on virus distributions, non-random, significant positive associations between 385 viruses remain. Although these associations are based on purely observational co-occurrence 386 patterns, given the spatial scale of our sampling and our method for detection that targets the 387 host's defence response 41 , we consider these indicative of biotic interactions between the 388 viruses within a host plant 72 . Positive associations between species along with microhabitat 389 preferences have been suggested to promote the (co)existence of rare species 73 . The 390 associations among coinfecting viruses may involve positive effects on replication 74,75 , or even 391 obligate dependencies as some viruses require their specific helper virus in order to complete 392 their lifecycle 76 . Viruses may also suppress host immunity allowing subsequent infections to 393 escape recognition by host immunity 77 . As expected, the inclusion of explanatory variables 394 reduced the number of direct associations between viruses identified by the models, 395 suggesting that shared environmental responses play an important role in the assembly of the 396 communities (see e.g. 78,79 ). After accounting for host-or habitat-related or spatial variables, 397 support for several direct associations (e.g. Tombusviridae-Alphaflexiviridae, 398 Betaflexiviridae-Potyviridae and Tombusviridae-Betaflexiviridae) was no longer detected. Although our virus community data set along with its environmental explanatory variables is 416 extensive and of high quality, our results are limited by our sample size and its effects on the 417 parameter estimation of our modelling method. We use regularisation to avoid overfitting, but 418 we note that the estimated parameters could in theory change with more data included. 419 However, as seen from the species-area and coexistence-area curves, our sampling effort 420 captures the detected virus diversity already before all the samples have been included, 421 indicating a promising sample size. 422 Our results demonstrate that natural plant virus communities are characterised by single 423 infections of few, dominating virus taxa as well as diverse, non-random coinfections. Virus 424 diversity can be explained by coexistence-promoting mechanisms, some of which we could 425 tease apart with our modelling. We show that host and habitat characteristics, as well as 426 spatial structure, resolve some of the observed co-occurrence patterns, to some degree 427 interchangeably. Importantly, we find that some virus-virus associations are mediated by 428 either host or habitat characteristics, or the spatial structure of the host populations.