UZH-Logo

Maintenance Infos

Applying Computational Linguistics and Language Models: From Descriptive Linguistics to Text Mining and Psycholinguistics


Schneider, Gerold. Applying Computational Linguistics and Language Models: From Descriptive Linguistics to Text Mining and Psycholinguistics. 2014, University of Zurich, Faculty of Arts.

Abstract

This synopsis presents the application of computational linguistic tools and approaches which were developed by the author for Descriptive Linguistics, Text Mining, and Psycholinguistics. It also describes how the computational linguistic tools, which are originally based on linguistic insights and assumptions, lead to new and detailed linguistic insights if applied to different research areas, and can in turn again improve the computational tools. The computational tools are based on models of language, predicting part-of-speech tags or syntactic attachment. These models, which were originally designed for the practical purpose of solving a computational linguistics task, can increasingly be used as models of human language processing.
A large-scale syntactic parser is the core linguistic tool that I am going to use. I further also employ its preprocessing tools, part-of-speech taggers and chunkers, and approaches learning from the data, so-called data- driven approaches. The use of syntactic parsing opens up a wide range of possibilities. In the first chapter, I summarise my applications of syntactic parsing, its preprocessing tools, and other computational linguistic approaches for the benefit of Descriptive Linguistics. I describe collocations, language variation, alternations, and language change. I will also describe the obvious advantage of an automatic approach: the sheer amount of data that can be processed, and the consistency, which can lead to the data-driven detection of new patterns. I also focus on the obvious disadvantage of using an automatic tool: that there is always a certain level of errors, which entails that evaluations are essential.
In the second chapter I describe the application of the same tools for Biomedical Text Mining. I evaluate the performance of our approach and summarise insights from a linguistic perspective, leaving more technical aspects to the side.
In the third chapter, I argue that a syntactic parser, in particular my approach which draws a clear division between competence and performance, can be used as a model to explore formulaic and creative language use, starting with Sinclair’s (1991) distinction between idiom principle and syntax principle, and ending with the suggestion to use the parser as a psycholinguistic model.
This synopsis aims to summarise 16 publications and show the connections that hold between them.

This synopsis presents the application of computational linguistic tools and approaches which were developed by the author for Descriptive Linguistics, Text Mining, and Psycholinguistics. It also describes how the computational linguistic tools, which are originally based on linguistic insights and assumptions, lead to new and detailed linguistic insights if applied to different research areas, and can in turn again improve the computational tools. The computational tools are based on models of language, predicting part-of-speech tags or syntactic attachment. These models, which were originally designed for the practical purpose of solving a computational linguistics task, can increasingly be used as models of human language processing.
A large-scale syntactic parser is the core linguistic tool that I am going to use. I further also employ its preprocessing tools, part-of-speech taggers and chunkers, and approaches learning from the data, so-called data- driven approaches. The use of syntactic parsing opens up a wide range of possibilities. In the first chapter, I summarise my applications of syntactic parsing, its preprocessing tools, and other computational linguistic approaches for the benefit of Descriptive Linguistics. I describe collocations, language variation, alternations, and language change. I will also describe the obvious advantage of an automatic approach: the sheer amount of data that can be processed, and the consistency, which can lead to the data-driven detection of new patterns. I also focus on the obvious disadvantage of using an automatic tool: that there is always a certain level of errors, which entails that evaluations are essential.
In the second chapter I describe the application of the same tools for Biomedical Text Mining. I evaluate the performance of our approach and summarise insights from a linguistic perspective, leaving more technical aspects to the side.
In the third chapter, I argue that a syntactic parser, in particular my approach which draws a clear division between competence and performance, can be used as a model to explore formulaic and creative language use, starting with Sinclair’s (1991) distinction between idiom principle and syntax principle, and ending with the suggestion to use the parser as a psycholinguistic model.
This synopsis aims to summarise 16 publications and show the connections that hold between them.

Downloads

61 downloads since deposited on 12 Feb 2015
59 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Habilitation
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Center for Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Uncontrolled Keywords:computational linguistics, descriptive linguistics, text mining, psycholinguistics, language modelling, syntactic parsing, natural language processing, language processing model
Language:English
Date:2014
Deposited On:12 Feb 2015 08:29
Last Modified:05 Apr 2016 19:03
Number of Pages:350
Permanent URL: https://doi.org/10.5167/uzh-108379

Download

[img]
Preview
Filetype: PDF
Size: 2MB

TrendTerms

TrendTerms displays relevant terms of the abstract of this publication and related documents on a map. The terms and their relations were extracted from ZORA using word statistics. Their timelines are taken from ZORA as well. The bubble size of a term is proportional to the number of documents where the term occurs. Red, orange, yellow and green colors are used for terms that occur in the current document; red indicates high interlinkedness of a term with other terms, orange, yellow and green decreasing interlinkedness. Blue is used for terms that have a relation with the terms in this document, but occur in other documents.
You can navigate and zoom the map. Mouse-hovering a term displays its timeline, clicking it yields the associated documents.

Author Collaborations