Abstract
This paper uses a novel data-driven probabilistic approach to address the century-old Inner-Outer hypothesis of Indo-Aryan. I develop a Bayesian hierarchical mixed-membership model to assess the validity of this hypothesis using a large data set of automatically extracted sound changes operating between Old Indo-Aryan and Modern Indo-Aryan speech varieties. I employ different prior distributions in order to model sound change, one of which, the Logistic Normal distribution, has not received much attention in linguistics outside of Natural Language Processing, despite its many attractive features. I find evidence for cohesive dialect groups that have made their imprint on contemporary Indo-Aryan languages, and find that when a Logistic Normal prior is used, the distribution of dialect components across languages is largely compatible with a core-periphery pattern similar to that proposed under the Inner-Outer hypothesis.