Publication:

Telling Compounds and Phrases Apart in Vietnamese. A Random Forest Classification

Date

Date

Date
2025
Journal Article
Published version
cris.virtual.orcidhttps://orcid.org/0000-0002-9087-0565
cris.virtual.orcidhttps://orcid.org/0000-0002-2765-6291
cris.virtualsource.orcid0a73188e-c464-488a-b544-64ea66244d77
cris.virtualsource.orcidce49a3a2-34d2-4fbb-b1fa-436f64ebf8a3
dc.date.accessioned2025-10-21T12:43:13Z
dc.date.available2025-10-21T12:43:13Z
dc.date.issued2025-09
dc.description.abstract

Vietnamese is an isolating language with rich productive compounding, but no morphosyntactic, phonotactic or phonological evidence to assume a linguistic level between the syllable and the phrase (Schiering et al. 2010). We model an artificial listener with a Random Forest Classifier, to study the phonetic distinguishability of compounds vs. phrases, following Nguyen and Ingram (2007). This Machine Learning algorithm represents the maximal potential for a system to differentiate the two classes based on phonetics alone. It ranks the importance of each phonetic correlate to the differentiation of these classes. This allows an interpretation beyond whether a difference on a particular phonetic dimension exists including how important this difference is. The results confirm that the two classes can only be phonetically separated under circumstances of maximal contrast, and that maximal contrast is realized through juncture marking. Furthermore, we show that the two classes cannot be perfectly separated even under conditions of maximal contrast and additionally that there is an across-the-board preference for a compound interpretation from the phonetic data, even when the Random Forest Classifier was trained on maximal contrast data.

dc.identifier.doi10.6519/TJL.202509_23(3).0002
dc.identifier.issn1729-4649
dc.identifier.urihttps://www.zora.uzh.ch/handle/20.500.14742/237861
dc.language.isoeng
dc.subject.ddc410 Linguistics
dc.subject.ddc490 Other languages
dc.subject.ddc400 Language
dc.title

Telling Compounds and Phrases Apart in Vietnamese. A Random Forest Classification

dc.typearticle
dcterms.accessRightsinfo:eu-repo/semantics/openAccess
dcterms.bibliographicCitation.journaltitleTaiwan Journal of Linguistics
dcterms.bibliographicCitation.number23.3
dcterms.bibliographicCitation.originalpublishernameNational Chengchi University
dcterms.bibliographicCitation.pageend48
dcterms.bibliographicCitation.pagestart23
dcterms.bibliographicCitation.volume23
dspace.entity.typePublication
uzh.contributor.authorVan Ommen, Sandrien
uzh.contributor.authorTorres Orjuela, Catalina
uzh.contributor.authorDong, Lam Quang
uzh.contributor.authorGiraud Anne-lise
uzh.contributor.authorBickel, Balthasar
uzh.document.availabilitypublished_version
uzh.identifier.doihttps://doi.org/10.5167/uzh-280142
uzh.oastatus.zoraGold
uzh.publication.citationVan Ommen, S., Torres Orjuela, C., Dong, L. Q., Giraud Anne-lise, & Bickel, B. (2025). Telling Compounds and Phrases Apart in Vietnamese. A Random Forest Classification. Taiwan Journal of Linguistics, 23(23.3), 23–48. https://doi.org/10.6519/TJL.202509_23(3).0002
uzh.publication.freeAccessAtdoi
uzh.publication.originalworkoriginal
uzh.publication.publishedStatusfinal
uzh.workflow.doajYes, journal is listed in DOAJ.
uzh.workflow.fulltextStatuspublic
uzh.workflow.rightsCheckoffen
Files

Original bundle

Name:
23.3.2_cam.pdf
Size:
1.23 MB
Format:
Adobe Portable Document Format

License bundle

Name:
license.txt
Size:
2.45 KB
Format:
Item-specific license agreed to upon submission
Description:
Publication available in collections: