This dissertation explores data-driven methodology of finding recurrent structure withinand between languages. The goal is to develop a method that is able to account for variation in the language data more accurately, as well as detect subtle regularities that are difficult to detect by traditional means. The dissertation specifically deals with clause linkageconstructions as a case study, since this is a particularly complex area of grammar whichis closely tied to discourse patterns. The proposed method is to annotate language corporafor form and meaning structures and subsequently to explore the emerging correlationsusing a custom data mining algorithm. Particular attention is given to elaboration of theformal models used to annotate meaning in corpora, as well as to developing the datamining algorithm. This methodology is then applied to sample corpora of English, Chintang and Latin as a pilot study, and the discovered structures are discussed. We observethat a) despite obvious typological differences between the examined languages there arestriking similarities in the distributions of the annotated features and b) that the proposedmethod, despite its limitations, is able to detect both highly abstract discourse structuresand concrete grammatical constructions within the languages.