Header

UZH-Logo

Maintenance Infos

Automatic Annotation and Assessment of Syntactic Structures in Law Texts Combining Rule-Based and Statistical Methods


Sugisaki, Kyoko. Automatic Annotation and Assessment of Syntactic Structures in Law Texts Combining Rule-Based and Statistical Methods. 2016, University of Zurich, Faculty of Arts.

Abstract

In this thesis, I investigate and develop methods for automatically analyzing and assess- ing German syntactic structures in domain-specific texts. As domain-specific texts, I use Swiss German-language law texts.
The automatic annotation of syntactic structures has long been studied in the research on natural language processing. Supervised statistical methods are regarded as state-of-the- art parsing methods, which are accurate but biased by the type of text. Consequently, the accuracy of statistical parsers decreases if they are used on domain-specific texts. The problem of domain bias in syntactic annotation should be solved if it directly affects the accuracy of an application. The syntactic assessment that I develop in this thesis is such an application that requires high accuracy of syntactic annotation. An effective solution to this problem would be the manual annotation of a large portion of the required domain texts. However, it is not feasible in practice because manual linguistic annotation is extremely labor intensive. To overcome this problem, I develop syntactic annotation methods that do not require the manual annotation of a large portion of the domain texts. The goal of this thesis is that the annotation accuracy on domain-specific texts is so high that it can be used for the application.
For the automatic syntactic assessment, I demonstrate a novel approach to model domain-specific style choice by combining rule-based and statistical methods. In the rule-based approach, I present a method that automatically detects the violations of style rules in legislative style guidelines. In the statistical approach, domain-specific writing style is defined in terms of stylistic choice between syntactic alternations. The syntactic selection is statistically modeled by classifying syntactic alternatives according to their syntactic complexity. The syntactic assessment requires automatic syntactic annotation.
For the automatic syntactic annotation, I present a linguistically motivated hybrid su- pertagger that analyzes topological dependency grammar relations in the German lan- guage. In this thesis, supertagging problems are seen as morphosyntactic ambiguity and syntactic resolution. Depending on the linguistic phenomena, the ambiguity is resolved by applying a rule-based and statistical tagging method: Morphological and syntactic hard constraints are applied in a constraint grammar approach. In contrast, lexical, semantic, and pragmatic soft and multivariate constraints are integrated into a conditional random fields model.
The main contribution of this thesis to the study of natural language processing is to show that a linguistically motivated annotation method is a viable approach to achieving a high performance of syntactic analysis with a few hundreds of manually annotated sentences from the domain.

Abstract

In this thesis, I investigate and develop methods for automatically analyzing and assess- ing German syntactic structures in domain-specific texts. As domain-specific texts, I use Swiss German-language law texts.
The automatic annotation of syntactic structures has long been studied in the research on natural language processing. Supervised statistical methods are regarded as state-of-the- art parsing methods, which are accurate but biased by the type of text. Consequently, the accuracy of statistical parsers decreases if they are used on domain-specific texts. The problem of domain bias in syntactic annotation should be solved if it directly affects the accuracy of an application. The syntactic assessment that I develop in this thesis is such an application that requires high accuracy of syntactic annotation. An effective solution to this problem would be the manual annotation of a large portion of the required domain texts. However, it is not feasible in practice because manual linguistic annotation is extremely labor intensive. To overcome this problem, I develop syntactic annotation methods that do not require the manual annotation of a large portion of the domain texts. The goal of this thesis is that the annotation accuracy on domain-specific texts is so high that it can be used for the application.
For the automatic syntactic assessment, I demonstrate a novel approach to model domain-specific style choice by combining rule-based and statistical methods. In the rule-based approach, I present a method that automatically detects the violations of style rules in legislative style guidelines. In the statistical approach, domain-specific writing style is defined in terms of stylistic choice between syntactic alternations. The syntactic selection is statistically modeled by classifying syntactic alternatives according to their syntactic complexity. The syntactic assessment requires automatic syntactic annotation.
For the automatic syntactic annotation, I present a linguistically motivated hybrid su- pertagger that analyzes topological dependency grammar relations in the German lan- guage. In this thesis, supertagging problems are seen as morphosyntactic ambiguity and syntactic resolution. Depending on the linguistic phenomena, the ambiguity is resolved by applying a rule-based and statistical tagging method: Morphological and syntactic hard constraints are applied in a constraint grammar approach. In contrast, lexical, semantic, and pragmatic soft and multivariate constraints are integrated into a conditional random fields model.
The main contribution of this thesis to the study of natural language processing is to show that a linguistically motivated annotation method is a viable approach to achieving a high performance of syntactic analysis with a few hundreds of manually annotated sentences from the domain.

Statistics

Downloads

49 downloads since deposited on 14 Oct 2016
46 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Dissertation
Referees:Volk Martin, Schneider Gerold, Kübler Sandra
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:February 2016
Deposited On:14 Oct 2016 14:47
Last Modified:08 Dec 2017 20:33

Download

Download PDF  'Automatic Annotation and Assessment of Syntactic Structures in Law Texts Combining Rule-Based and Statistical Methods'.
Preview
Filetype: PDF
Size: 2MB