Header

UZH-Logo

Maintenance Infos

Digging deep: mining corpora for typological patterns and beyond


Zakharko, Taras. Digging deep: mining corpora for typological patterns and beyond. 2015, University of Zurich, Faculty of Arts.

Abstract

This dissertation explores data-driven methodology of finding recurrent structure withinand between languages. The goal is to develop a method that is able to account for variation in the language data more accurately, as well as detect subtle regularities that are difficult to detect by traditional means. The dissertation specifically deals with clause linkageconstructions as a case study, since this is a particularly complex area of grammar whichis closely tied to discourse patterns. The proposed method is to annotate language corporafor form and meaning structures and subsequently to explore the emerging correlationsusing a custom data mining algorithm. Particular attention is given to elaboration of theformal models used to annotate meaning in corpora, as well as to developing the datamining algorithm. This methodology is then applied to sample corpora of English, Chintang and Latin as a pilot study, and the discovered structures are discussed. We observethat a) despite obvious typological differences between the examined languages there arestriking similarities in the distributions of the annotated features and b) that the proposedmethod, despite its limitations, is able to detect both highly abstract discourse structuresand concrete grammatical constructions within the languages.

Abstract

This dissertation explores data-driven methodology of finding recurrent structure withinand between languages. The goal is to develop a method that is able to account for variation in the language data more accurately, as well as detect subtle regularities that are difficult to detect by traditional means. The dissertation specifically deals with clause linkageconstructions as a case study, since this is a particularly complex area of grammar whichis closely tied to discourse patterns. The proposed method is to annotate language corporafor form and meaning structures and subsequently to explore the emerging correlationsusing a custom data mining algorithm. Particular attention is given to elaboration of theformal models used to annotate meaning in corpora, as well as to developing the datamining algorithm. This methodology is then applied to sample corpora of English, Chintang and Latin as a pilot study, and the discovered structures are discussed. We observethat a) despite obvious typological differences between the examined languages there arestriking similarities in the distributions of the annotated features and b) that the proposedmethod, despite its limitations, is able to detect both highly abstract discourse structuresand concrete grammatical constructions within the languages.

Statistics

Additional indexing

Item Type:Dissertation (monographical)
Referees:Bickel Balthasar, Gast Volker
Communities & Collections:06 Faculty of Arts > Institute of German Studies
06 Faculty of Arts > Department of Comparative Linguistics
Dewey Decimal Classification:430 German & related languages
Language:English
Date:2015
Deposited On:23 Mar 2019 17:47
Last Modified:25 Sep 2019 00:31
Number of Pages:168
OA Status:Closed
Free access at:Related URL. An embargo period may apply.
Related URLs:https://www.recherche-portal.ch/primo-explore/fulldisplay?docid=ebi01_prod011141655&context=L&vid=ZAD&search_scope=default_scope&tab=default_tab&lang=de_DE (Library Catalogue)

Download

Full text not available from this repository.
Get full-text in a library