Header

UZH-Logo

Maintenance Infos

Detecting innovations in a parsed corpus of learner english


Schneider, Gerold; Gilquin, Gaëtanelle (2016). Detecting innovations in a parsed corpus of learner english. International Journal of Learner Corpus Research, 2(2):177-204.

Abstract

In research on L2 English, recent corpus-based studies indicate that some non- standard forms are shared by indigenized (ESL) and foreign (EFL) varieties of English, which challenges the idea of a clear dichotomy between innovation and error. We present a data-driven large-scale method to detect innovations, test it on verb + preposition structures (including phrasal verbs) and adjective + preposition structures, and describe similarities and differences between EFL and ESL. We use a dependency-parsed version of the International Corpus of Learner English to automatically extract potential innovations, defined as patterns of overuse compared to the British National Corpus as reference corpus. We measure overuse by means of collocation measures like O/E or T-score, and compare our results with similar results for ESL. In both quantitative and qualitative analyses, we detect similarities between the two varieties (e.g. discuss about) and dissimilarities (e.g. accuse for, only distinctive for EFL). We report more verb/adjective + preposition combinations than previous studies and discuss the roles of analogy and transfer.

Abstract

In research on L2 English, recent corpus-based studies indicate that some non- standard forms are shared by indigenized (ESL) and foreign (EFL) varieties of English, which challenges the idea of a clear dichotomy between innovation and error. We present a data-driven large-scale method to detect innovations, test it on verb + preposition structures (including phrasal verbs) and adjective + preposition structures, and describe similarities and differences between EFL and ESL. We use a dependency-parsed version of the International Corpus of Learner English to automatically extract potential innovations, defined as patterns of overuse compared to the British National Corpus as reference corpus. We measure overuse by means of collocation measures like O/E or T-score, and compare our results with similar results for ESL. In both quantitative and qualitative analyses, we detect similarities between the two varieties (e.g. discuss about) and dissimilarities (e.g. accuse for, only distinctive for EFL). We report more verb/adjective + preposition combinations than previous studies and discuss the roles of analogy and transfer.

Statistics

Altmetrics

Downloads

0 downloads since deposited on 16 Feb 2017
0 downloads since 12 months

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:06 Faculty of Arts > English Department
06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Center for Linguistics
Dewey Decimal Classification:820 English & Old English literatures
Uncontrolled Keywords:Learner English, English as a Foreign Language (EFL), English as a Second Language (ESL), data-driven approach, corpus linguistics, verb-preposition constructions, Cognitive Linguistics, Error Analysis, collocations, linguistic innovations
Language:English
Date:2016
Deposited On:16 Feb 2017 14:33
Last Modified:16 Feb 2017 14:33
Publisher:John Benjamins Publishing
ISSN:2215-1478
Publisher DOI:https://doi.org/10.1075/ijlcr.2.2.03sch

Download

Preview Icon on Download
Content: Published Version
Filetype: PDF - Registered users only
Size: 2MB
View at publisher

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations