UZH-Logo

Using automatically parsed corpora to discover lexico-grammatical features of English varieties


Schneider, Gerold (2011). Using automatically parsed corpora to discover lexico-grammatical features of English varieties. In: 30th International Conference on Lexis and Grammar, Nicosia, Cyprus, 5 October 2011 - 8 October 2011, 251-258.

Abstract

We employ syntactic parsing to describe and to discover lexico-grammatical features of English regional varieties. In the absence of suitable Treebanks, automatically parsed corpora (tree jungles) can be used. As an example we focus on Indian English, using the International Corpus of English (ICE), and the British National Corpus (BNC). We use a largely corpus-driven method. There are few differences in frequencies of syntactic relations between the corpora, but considerable differences when taking the intricate relations between grammar and lexis into account. We describe differences in the use of zero articles, verb-preposition constructions, and ditransitive verbs. We show that relatively small corpora can be used to discover subtle lexico-grammatical differences.

We employ syntactic parsing to describe and to discover lexico-grammatical features of English regional varieties. In the absence of suitable Treebanks, automatically parsed corpora (tree jungles) can be used. As an example we focus on Indian English, using the International Corpus of English (ICE), and the British National Corpus (BNC). We use a largely corpus-driven method. There are few differences in frequencies of syntactic relations between the corpora, but considerable differences when taking the intricate relations between grammar and lexis into account. We describe differences in the use of zero articles, verb-preposition constructions, and ditransitive verbs. We show that relatively small corpora can be used to discover subtle lexico-grammatical differences.

Downloads

231 downloads since deposited on 05 Mar 2012
33 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Other), refereed, original work
Communities & Collections:06 Faculty of Arts > English Department
06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
820 English & Old English literatures
410 Linguistics
Uncontrolled Keywords:lexico-grammar syntactic parsing language variation Indian English corpus-driven
Language:English
Event End Date:8 October 2011
Deposited On:05 Mar 2012 13:23
Last Modified:11 May 2016 07:49
Publisher:University of Cyprus, Department of French Studies and Modern Languages
Official URL:http://infolingu.univ-mlv.fr/Colloques/lgc/index.php?year=2011&lang=en&page=1
Permanent URL: http://doi.org/10.5167/uzh-52963

Download

[img]
Preview
Content: Accepted Version
Language: English
Filetype: PDF
Size: 191kB

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations