UZH-Logo

Parser-based analysis of syntax-lexis interactions


Lehmann, Hans Martin; Schneider, Gerold (2009). Parser-based analysis of syntax-lexis interactions. In: Jucker, Andreas H; Schreier, Daniel; Hundt, Marianne. Corpora: Pragmatics and Discourse. Amsterdam, The Netherlands: Rodopi, 477-502.

Abstract

Fixedness in language has been extensively studied in areas like multi-word units, idiomatic expressions, collocations and verb-particle constructions. These have often been treated as relatively fixed non-compositional sequences, which allow for little variation. In our paper we will focus on co-occurrence phenomena between elements in syntactic relations. Specifically, we focus on subject-verb and verb-object relations in active and passive constructions. Looking for fixedness in these syntactic relations where compositionality is expected to hold to a large degree may strike the reader as a strange undertaking. Our main interest lies in establishing how far an open choice principle holds for these relations and to what degree we can find fixedness in these syntactic relations.
The identification of syntactic relations requires syntactically annotated corpora. Most standard corpora of sufficient size are either not annotated at all, or annotated at the non-hierarchical level of part-of-speech tags only. They typically contain no hierarchical information about the syntactic organisation of sentences.
Parsing approaches to fixedness are still quite rare. Exceptions are Lin (1998) and Seretan and Wehrli (2006). Robust broad-coverage syntactic parsers, for example Schneider (2007) or Andersen (2008), have now become available, offering new perspectives for this research.
This paper describes the syntactic annotation of over 160 million running words with the help of Pro3Gres, a dependency parser. See Schneider (2007) for a more detailed description. We document the extraction of a database with verb centres and their dependents. We then explore the possibilities and limitations of this dependency database for the study of fixedness in syntactic relations.

Fixedness in language has been extensively studied in areas like multi-word units, idiomatic expressions, collocations and verb-particle constructions. These have often been treated as relatively fixed non-compositional sequences, which allow for little variation. In our paper we will focus on co-occurrence phenomena between elements in syntactic relations. Specifically, we focus on subject-verb and verb-object relations in active and passive constructions. Looking for fixedness in these syntactic relations where compositionality is expected to hold to a large degree may strike the reader as a strange undertaking. Our main interest lies in establishing how far an open choice principle holds for these relations and to what degree we can find fixedness in these syntactic relations.
The identification of syntactic relations requires syntactically annotated corpora. Most standard corpora of sufficient size are either not annotated at all, or annotated at the non-hierarchical level of part-of-speech tags only. They typically contain no hierarchical information about the syntactic organisation of sentences.
Parsing approaches to fixedness are still quite rare. Exceptions are Lin (1998) and Seretan and Wehrli (2006). Robust broad-coverage syntactic parsers, for example Schneider (2007) or Andersen (2008), have now become available, offering new perspectives for this research.
This paper describes the syntactic annotation of over 160 million running words with the help of Pro3Gres, a dependency parser. See Schneider (2007) for a more detailed description. We document the extraction of a database with verb centres and their dependents. We then explore the possibilities and limitations of this dependency database for the study of fixedness in syntactic relations.

Citations

Downloads

140 downloads since deposited on 14 Jan 2010
17 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > English Department
Dewey Decimal Classification:000 Computer science, knowledge & systems
820 English & Old English literatures
410 Linguistics
Uncontrolled Keywords:corpus linguistics, dependency grammar, collocations, passive, lexical preferences
Language:English
Date:2009
Deposited On:14 Jan 2010 14:02
Last Modified:05 Apr 2016 13:35
Publisher:Rodopi
Series Name:Language and Computers: Studies in Practical Linguistics
Number:68
ISBN:978-90-420-2592-9
Additional Information:Papers from the 29th international conference on English language research on computerized corpora (ICAME 29), Ascona, Switzerland, 14-18 May 2008
Official URL:http://www.rodopi.nl/senj.asp?BookId=LC+68
Related URLs:http://opac.nebis.ch/F/?local_base=NEBIS&con_lng=GER&func=find-b&find_code=SYS&request=005812973
https://www.zora.uzh.ch/18735/
Permanent URL: http://doi.org/10.5167/uzh-24617

Download

[img]
Preview
Filetype: PDF
Size: 1MB

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations