Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

Automatic Annotation and Assessment of Syntactic Structures in Law Texts Combining Rule-Based and Statistical Methods

Sugisaki, Kyoko. Automatic Annotation and Assessment of Syntactic Structures in Law Texts Combining Rule-Based and Statistical Methods. 2016, University of Zurich, Faculty of Arts.

Abstract

In this thesis, I investigate and develop methods for automatically analyzing and assessing German syntactic structures in domain-specific texts. As domain-specific texts, I use Swiss German-language law texts.
The automatic annotation of syntactic structures has long been studied in the research on natural language processing. Supervised statistical methods are regarded as state-of-the-art parsing methods, which are accurate but biased by the type of text. Consequently, the accuracy of statistical parsers decreases if they are used on domain-specific texts. The problem of domain bias in syntactic annotation should be solved if it directly affects the accuracy of an application. The syntactic assessment that I develop in this thesis is such an application that requires high accuracy of syntactic annotation. An effective solution to this problem would be the manual annotation of a large portion of the required domain texts. However, it is not feasible in practice because manual linguistic annotation is extremely labor intensive. To overcome this problem, I develop syntactic annotation methods that do not require the manual annotation of a large portion of the domain texts. The goal of this thesis is that the annotation accuracy on domain-specific texts is so high that it can be used for the application.
For the automatic syntactic assessment, I demonstrate a novel approach to model domain-specific style choice by combining rule-based and statistical methods. In the rule-based approach, I present a method that automatically detects the violations of style rules in legislative style guidelines. In the statistical approach, domain-specific writing style is defined in terms of stylistic choice between syntactic alternations. The syntactic selection is statistically modeled by classifying syntactic alternatives according to their syntactic complexity. The syntactic assessment requires automatic syntactic annotation.
For the automatic syntactic annotation, I present a linguistically motivated hybrid supertagger that analyzes topological dependency grammar relations in the German language. In this thesis, supertagging problems are seen as morphosyntactic ambiguity and syntactic resolution. Depending on the linguistic phenomena, the ambiguity is resolved by applying a rule-based and statistical tagging method: Morphological and syntactic hard constraints are applied in a constraint grammar approach. In contrast, lexical, semantic, and pragmatic soft and multivariate constraints are integrated into a conditional random fields model.
The main contribution of this thesis to the study of natural language processing is to show that a linguistically motivated annotation method is a viable approach to achieving a high performance of syntactic analysis with a few hundreds of manually annotated sentences from the domain.

Additional indexing

Item Type:Dissertation (monographical)
Referees:Volk Martin, Schneider Gerold, Kübler Sandra
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
UZH Dissertations
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:February 2016
Deposited On:14 Oct 2016 14:47
Last Modified:19 Aug 2021 20:14
OA Status:Green

Metadata Export

Statistics

Downloads

295 downloads since deposited on 14 Oct 2016
50 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications