Neighbor GWAS: incorporating neighbor genotypic identity into genome-wide association studies of field herbivory on Arabidopsis thaliana

An increasing number of field studies show that the phenotype of an individual plant depends not only on its genotype but also on those of neighboring plants; however, this fact is not taken into consideration in genome-wide association studies (GWAS). Based on the Ising model of ferromagnetism, we incorporated neighbor genotypic identity into a regression model in this study. The proposed method, named “neighbor GWAS”, was applied to simulated and real phenotypes using Arabidopsis thaliana accessions. Our simulations showed that phenotypic variation explained by neighbor effects approached a plateau when an effective spatial scale became narrow. Thus, the effective scale of neighbor effects could be estimated by patterns of the phenotypic variation explained. The power to detect causal variants of neighbor effects was moderate to strong when a trait was governed by tens of variants. In contrast, there was a reasonable power down when hundreds of variants underlay a single trait. We applied the neighbor GWAS to field herbivory data on 200 accessions of A. thaliana, and found that the neighbor effects more largely contributed to the observed damage variation than self-genotype effects. Interestingly, several defensin family genes were associated with neighbor effects on the herbivory, while self-genotype effects were related to flavin-monooxygenase glucosinolate S-oxygenase 2 (FMO GS-OX2). Overall, the neighbor GWAS highlights the overlooked but significant role of plant neighborhood effects in shaping phenotypic variation, thereby providing a novel and powerful tool to dissect complex traits in spatially structured environments.


INTRODUCTION
Plants are immobile and thus cannot escape their neighbors. In natural and agricultural fields,

88
Assuming that an individual plant is a magnet, two alleles at each locus correspond to the 89 north or south dipole, whereby genome-wide multiple testing across all loci is analogous to a 90 number of parallel two-dimensional layers. The Ising model has a clear advantage in its 91 interpretability, such that (i) the optimization problem for a population sum of trait values can 92 be regarded as an inverse problem of a simple linear model, (ii) the sign of neighbor effects 93 determines a model trend to generate a clustered or checkered spatial pattern of the two 94 states, and (iii) the self-genotypic effect determines the general tendency to favor one allele 95 over another (Fig. 1).
In this study, we proposed a new methodology integrating GWAS and the Ising 97 model, named "neighbor GWAS". The method was applied to simulated phenotypes and real 98 data of field herbivory on A. thaliana. We addressed two specific questions: (i) what spatial 99 and genetic factors influenced the power to detect causal variants? (ii) were neighbor effects 100 significant sources of leaf damage variation in field-grown A. thaliana? Based on the 101 simulation and application, we determined the feasibility of our approach to detect neighbor 102 effects in field-grown plants.
where β0 indicates the intercept, and the term β1xi represents fixed self-genotype effects as 139 tested in conventional GWAS; β2 is the coefficient of fixed neighbor effects, and the neighbor is scaled by the number of neighboring plants, L. Variance components 141 due to a sample structure in self and neighbor effects was modeled by a random effect 142 u i~ Norm(0, σ S 2 K S +σ N 2 K N ). The residual was expressed as e i~ Norm(0, σ e 2 ). p. 7 The n × n variance-covariance matrices represented the similarity in self-genotypes 144 (i.e., kinship) and neighbor covariates among n individual plants as: where q indicates the number of markers. As we defined x i(j) ∈{-1, +1}, the elements of the  GWAS is referred to as "neighbor GWAS" hereafter.
where β12 is the coefficient for asymmetry in neighbor effects. Total variance components due 202 to the three background effects i.e., the self, neighbor, and self-by-neighbor effects is defined and σ SxN 2 , determined the relative importance of self-genotype, neighbor, and asymmetric 205 neighbor effects in ui. Given the n plants × q marker explanatory matrix with ), the similarity in asymmetric neighbor effects was calculated as (i.e., β1 ≠ 0 and β2 ≠ 0 and β12 ≠ 0); another 15% had both the self and neighbor effects, but no 223 asymmetry in the neighbor effects (β1 ≠ 0 and β2 ≠ 0 and β12 = 0); another 35% had self-224 genotypic effects only (β1 ≠ 0); and the remaining 35% had neighbor effects alone (β2 ≠ 0).

225
Given its biological significance, we assumed that some loci having neighbor signals 226 possessed asymmetric interactions between neighbors (β2 ≠ 0 and β12 ≠ 0) while the others 227 had symmetric ones (β2 ≠ 0 and β12 = 0). Therefore, the number of causal SNPs in β12 was 228 smaller than that in the main neighbor effects β2. According to this assumption, the variance 229 component σ SxN 2 was also assumed to be smaller than σ N 2 . cases. The AUCs were also calculated using standard linear models without any random 251 effects to examine whether the linear mixed models were superior to the linear models.

272
The most predominant herbivore in this field trial was the diamond back moth Plutella  p. 14 also significantly affected the AUCs of the self and neighbor effects, but these additional 309 effects were less significant compared to those of PVEβ alone (Table 1). In contrast, the 310 AUCs of neither self nor neighbor effects were significantly affected by the ratio of three 311 variance components σ S 2 :σ N 2 :σ SxN 2 ( Table 1).

312
Notably, there was a clear relationship between the distance decay α and the 313 proportion of phenotypic variation explained by neighbor effects PVEnei or AUCs at different 314 spatial scales (Fig. 3). If the distance decay was weak and the effective range of neighbor 315 effects was broad, PVEnei and AUCs increased linearly as the reference spatial scale was 316 broadened (Fig. 3a). On the other hand, if the distance decay was strong and the effective 317 scale of neighbor effects was narrow, PVEnei saturated at the scale of the first nearest 318 neighbors (Fig. 3c) or AUCs did not increase (Fig. 3b, c). These results remained the same 319 between the number of causal SNPs = 20 and 200 ( Fig. 3 and Fig. S1). The line of simulation 320 results indicated that the effective spatial scales could be estimated by calculating PVEnei 321 across different spatial scales.

322
In the case of the number of causal SNPs = 20, signals of major-effect genes were 323 well detected as AUC ranged from moderate (>0.7) to high (>0.9) (Fig. S2). For the case of  Table 2a). A QQ-plot did not exhibit an 353 inflation of p-values for the self-genotype effects (Fig. S4). 354 We found a marginally significant SNP for neighbor effects at the second and third 355 chromosome (Fig. 4c), of which the second chromosomal region had higher association  Table 2b). Of the genes with these GO annotations, we found 22 low-molecular 366 weight cysteine-rich proteins or plant defensin family proteins (Table S2).

367
Based on the estimated coefficients β 1 and β 2 , we ran a post hoc simulation to infer    was considered a random effect in a linear mixed model, such that u i~ Norm(0, σ S 2 K S +σ N 2 K N ).

695
Once β1 and β2 are determined, we could simulate a genotype distribution that maximizes or 696 minimizes Σyi.  Figure S1.