Header

UZH-Logo

Maintenance Infos

Identifying phrasemes via interlingual association measures - A data-driven approach on dependency-parsed and word-aligned parallel corpora


Graën, Johannes (2017). Identifying phrasemes via interlingual association measures - A data-driven approach on dependency-parsed and word-aligned parallel corpora. In: Konecny, Christine; Autelli, Erica; Abel, Andrea; Zanasi, Lorenzo. Lexemkombinationen und typisierte Rede im mehrsprachigen Kontext. Tübingen: Stauffenburg Verlag, im Druck.

Abstract

In corpus linguistics, statistical association measures play a major role in identifying collocations such as ‘play’ and ‘role’ in ‘play a role’. Those two words that appear considerably more often in the same context than one would expect from a random distribution are collocates. They typically constitute meaning beyond the bare combination of both words’ semantics.

We employ the same association measures on interlingual word cooccurrences based on statistical word alignment and combine them with intralingual association measures on syntactical dependency relations in order to identify phrasemes. Support verb constructions exemplify our approach. They are characterized by the respective verb contributing little to the semantics of the whole construction, which we can determine with the aid of our intralingual association measures.

Abstract

In corpus linguistics, statistical association measures play a major role in identifying collocations such as ‘play’ and ‘role’ in ‘play a role’. Those two words that appear considerably more often in the same context than one would expect from a random distribution are collocates. They typically constitute meaning beyond the bare combination of both words’ semantics.

We employ the same association measures on interlingual word cooccurrences based on statistical word alignment and combine them with intralingual association measures on syntactical dependency relations in order to identify phrasemes. Support verb constructions exemplify our approach. They are characterized by the respective verb contributing little to the semantics of the whole construction, which we can determine with the aid of our intralingual association measures.

Statistics

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English, German, Italian
Date:2017
Deposited On:10 Mar 2017 13:23
Last Modified:13 Mar 2017 07:25
Publisher:Stauffenburg Verlag
Series Name:Stauffenburg Linguistik
ISSN:1430-4139

Download

Full text not available from this repository.