Abstract
Szmrecsanyi <jats:italic>et al.</jats:italic> (2016) define probabilistic indigenization as the process whereby probabilistic constraints shape variation patterns in different ways, which eventually leads to more heterogeneity in the constraints governing syntactic variation across different varieties of English. The present study extends our knowledge of the heterogeneity of probabilistic grammars by sketching a corpus-based variationist method for calculating the similarity between varieties thereby drawing inspiration from the comparative sociolinguistics literature. Based on linguistic material from the <jats:italic>International Corpus of English</jats:italic>, we ascertain the degree of regional variability of five probabilistic constraints on the genitive, dative, particle placement and subject pronoun omission alternations across three varieties of English, namely British, Indian and Singapore English. Our results indicate that, of the four alternations under study, the genitive alternation is the most homogeneous one from a regional perspective, followed – in increasing order of heterogeneity – by subject pronoun omission, dative and particle placement alternations. On the basis of these findings, we evaluate claims in the literature according to which the extent of probabilistic indigenization is proportional to the lexical specificity of the syntactic phenomenon under study, a hypothesis that is borne out by our data.