Header

UZH-Logo

Maintenance Infos

Identifying phrasemes via interlingual association measures - A data-driven approach on dependency-parsed and word-aligned parallel corpora


Graën, Johannes (2021). Identifying phrasemes via interlingual association measures - A data-driven approach on dependency-parsed and word-aligned parallel corpora. In: Konecny, Christine; Autelli, Erica; Abel, Andrea; Zanasi, Lorenzo. Lexemkombinationen und typisierte Rede im mehrsprachigen Kontext. Tübingen: Stauffenburg Verlag, im Druck.

Abstract

In corpus linguistics, statistical association measures play a major role in identifying collocations such as ‘play’ and ‘role’ in ‘play a role’. Those two words that appear considerably more often in the same context than one would expect from a random distribution are collocates. They typically constitute meaning beyond the bare combination of both words’ semantics.
We employ the same association measures on interlingual word co-occurrences based on statistical word alignment and combine them with intralingual association measures on syntactical dependency relations in order to identify phrasemes. Support verb constructions exemplify our approach. They are characterized by the respective verb contributing little to the semantics of the whole construction, which we can determine with the aid of our intralingual association measures.

Abstract

In corpus linguistics, statistical association measures play a major role in identifying collocations such as ‘play’ and ‘role’ in ‘play a role’. Those two words that appear considerably more often in the same context than one would expect from a random distribution are collocates. They typically constitute meaning beyond the bare combination of both words’ semantics.
We employ the same association measures on interlingual word co-occurrences based on statistical word alignment and combine them with intralingual association measures on syntactical dependency relations in order to identify phrasemes. Support verb constructions exemplify our approach. They are characterized by the respective verb contributing little to the semantics of the whole construction, which we can determine with the aid of our intralingual association measures.

Statistics

Altmetrics

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Linguistic Research Infrastructure (LiRI)
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English, German, Italian
Date:2021
Deposited On:10 Mar 2017 13:23
Last Modified:21 Mar 2023 12:04
Publisher:Stauffenburg Verlag
Series Name:Stauffenburg Linguistik
ISSN:1430-4139
ISBN:978-3-95809-539-7
OA Status:Closed
Full text not available from this repository.