Publication: Telling Compounds and Phrases Apart in Vietnamese. A Random Forest Classification
Telling Compounds and Phrases Apart in Vietnamese. A Random Forest Classification
Date
Date
Date
| cris.virtual.orcid | https://orcid.org/0000-0002-9087-0565 | |
| cris.virtual.orcid | https://orcid.org/0000-0002-2765-6291 | |
| cris.virtualsource.orcid | 0a73188e-c464-488a-b544-64ea66244d77 | |
| cris.virtualsource.orcid | ce49a3a2-34d2-4fbb-b1fa-436f64ebf8a3 | |
| dc.date.accessioned | 2025-10-21T12:43:13Z | |
| dc.date.available | 2025-10-21T12:43:13Z | |
| dc.date.issued | 2025-09 | |
| dc.description.abstract | Vietnamese is an isolating language with rich productive compounding, but no morphosyntactic, phonotactic or phonological evidence to assume a linguistic level between the syllable and the phrase (Schiering et al. 2010). We model an artificial listener with a Random Forest Classifier, to study the phonetic distinguishability of compounds vs. phrases, following Nguyen and Ingram (2007). This Machine Learning algorithm represents the maximal potential for a system to differentiate the two classes based on phonetics alone. It ranks the importance of each phonetic correlate to the differentiation of these classes. This allows an interpretation beyond whether a difference on a particular phonetic dimension exists including how important this difference is. The results confirm that the two classes can only be phonetically separated under circumstances of maximal contrast, and that maximal contrast is realized through juncture marking. Furthermore, we show that the two classes cannot be perfectly separated even under conditions of maximal contrast and additionally that there is an across-the-board preference for a compound interpretation from the phonetic data, even when the Random Forest Classifier was trained on maximal contrast data. | |
| dc.identifier.doi | 10.6519/TJL.202509_23(3).0002 | |
| dc.identifier.issn | 1729-4649 | |
| dc.identifier.uri | https://www.zora.uzh.ch/handle/20.500.14742/237861 | |
| dc.language.iso | eng | |
| dc.subject.ddc | 410 Linguistics | |
| dc.subject.ddc | 490 Other languages | |
| dc.subject.ddc | 400 Language | |
| dc.title | Telling Compounds and Phrases Apart in Vietnamese. A Random Forest Classification | |
| dc.type | article | |
| dcterms.accessRights | info:eu-repo/semantics/openAccess | |
| dcterms.bibliographicCitation.journaltitle | Taiwan Journal of Linguistics | |
| dcterms.bibliographicCitation.number | 23.3 | |
| dcterms.bibliographicCitation.originalpublishername | National Chengchi University | |
| dcterms.bibliographicCitation.pageend | 48 | |
| dcterms.bibliographicCitation.pagestart | 23 | |
| dcterms.bibliographicCitation.volume | 23 | |
| dspace.entity.type | Publication | |
| uzh.contributor.author | Van Ommen, Sandrien | |
| uzh.contributor.author | Torres Orjuela, Catalina | |
| uzh.contributor.author | Dong, Lam Quang | |
| uzh.contributor.author | Giraud Anne-lise | |
| uzh.contributor.author | Bickel, Balthasar | |
| uzh.document.availability | published_version | |
| uzh.identifier.doi | https://doi.org/10.5167/uzh-280142 | |
| uzh.oastatus.zora | Gold | |
| uzh.publication.citation | Van Ommen, S., Torres Orjuela, C., Dong, L. Q., Giraud Anne-lise, & Bickel, B. (2025). Telling Compounds and Phrases Apart in Vietnamese. A Random Forest Classification. Taiwan Journal of Linguistics, 23(23.3), 23–48. https://doi.org/10.6519/TJL.202509_23(3).0002 | |
| uzh.publication.freeAccessAt | doi | |
| uzh.publication.originalwork | original | |
| uzh.publication.publishedStatus | final | |
| uzh.workflow.doaj | Yes, journal is listed in DOAJ. | |
| uzh.workflow.fulltextStatus | public | |
| uzh.workflow.rightsCheck | offen | |
| Files | ||
| Publication available in collections: |