Ostracism, or social exclusion, is widespread and associated with a range of detrimental psychological and social outcomes. Ostracism is typically explained as instrumental punishment of free-riders or deviants. However, this instrumental account fails to explain many of the features of real-world ostracism, including its prevalence. Here we hypothesized that ostracism can emerge incidentally (non-instrumentally) when people choose partners in social interactions, and that this process is driven by simple learning mechanisms. We tested this hypothesis in four experiments (n = 456) with economic games in dynamic social networks. Contrary to the instrumental account of ostracism, we find that the targets of ostracism are not primarily free-riders. Instead, incidental initial variability in choosing partners for social interactions predicts later ostracism better than the instrumental account. Using computational modelling, we show that simple reinforcement learning mechanisms explain the incidental emergence of ostracism, and that they do so better than a formalization of the instrumental account. Finally, we leveraged these reinforcement learning mechanisms to experimentally reduce incidental ostracism. Our results demonstrate that ostracism is more incidental than previously assumed and can arise from basic forms of learning. They also show that the same mechanisms that result in incidental ostracism can help to reduce its emergence.