Publication:

Large-scale Hierarchical Alignment for Author Style Transfer

Date

Date

Date
2018
Working Paper
dc.contributor.institutionCornell University
dc.date.accessioned2019-05-16T09:47:42Z
dc.date.available2019-05-16T09:47:42Z
dc.date.issued2018
dc.description.abstract

We propose a simple method for extracting pseudo-parallel monolingual sentence pairs from comparable corpora representative of two different author styles, such as scientific papers and Wikipedia articles. Our approach is to first hierarchically search for nearest document neighbours and then for sentences therein. We demonstrate the effectiveness of our method through automatic and extrinsic evaluation on two tasks: text simplification from Wikipedia to Simple Wikipedia and style transfer from scientific journal articles to press releases. We show that pseudo-parallel sentences extracted with our method not only improve existing parallel data, but can even lead to competitive performance on their own.

dc.identifier.issn2331-8422
dc.identifier.urihttps://www.zora.uzh.ch/handle/20.500.14742/157554
dc.language.isoeng
dc.subject.ddc570 Life sciences; biology
dc.title

Large-scale Hierarchical Alignment for Author Style Transfer

dc.typeworking_paper
dcterms.accessRightsinfo:eu-repo/semantics/openAccess
dcterms.bibliographicCitation.number1810.08237
dcterms.bibliographicCitation.urlhttps://arxiv.org/abs/1810.08237
dspace.entity.typePublicationen
uzh.contributor.authorNikolov, Nikola I
uzh.contributor.authorHahnloser, Richard H R
uzh.contributor.correspondenceYes
uzh.contributor.correspondenceNo
uzh.date.akaber2019
uzh.document.availabilitypostprint
uzh.eprint.datestamp2019-05-16 09:47:42
uzh.eprint.lastmod2023-09-22 13:10:11
uzh.eprint.statusChange2019-05-16 09:47:42
uzh.harvester.ethYes
uzh.harvester.nbNo
uzh.identifier.doi10.5167/uzh-170856
uzh.oastatus.zoraGreen
uzh.publication.citationNikolov, Nikola I; Hahnloser, Richard H R (2018). Large-scale Hierarchical Alignment for Author Style Transfer. ArXiv.org 1810.08237, Cornell University.
uzh.publication.freeAccessAtofficialurl
uzh.publication.seriesTitleArXiv.org
uzh.workflow.eprintid170856
uzh.workflow.fulltextStatuspublic
uzh.workflow.revisions20
uzh.workflow.rightsCheckkeininfo
uzh.workflow.statusarchive
Files

Original bundle

Name:
1810.08237.pdf
Size:
427.85 KB
Format:
Adobe Portable Document Format
Publication available in collections: