Navigation auf zora.uzh.ch

Search ZORA

ZORA (Zurich Open Repository and Archive)

DeepFry: Identifying Vocal Fry Using Deep Neural Networks

Chernyak, Bronya Roni; Ben Simon, Talia; Segal, Yael; Steffman, Jeremy; Chodroff, Eleanor; Cole, Jennifer; Keshet, Joseph (2022). DeepFry: Identifying Vocal Fry Using Deep Neural Networks. In: Interspeech 2022, Incheon, Korea, 18 September 2022 - 22 September 2022, 3578-3582.

Abstract

Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch. It occurs in diverse languages and is prevalent in American English, where it is used not only to mark phrase finality, but also sociolinguistic factors and affect. Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems, particularly for languages where creak is frequently used. This paper proposes a deep learning model to detect creaky voice in fluent speech. The model is composed of an encoder and a classifier trained together. The encoder takes the raw waveform and learns a representation using a convolutional neural network. The classifier is implemented as a multi-headed fully-connected network trained to detect creaky voice, voicing, and pitch, where the last two are used to refine creak prediction. The model is trained and tested on speech of American English speakers, annotated for creak by trained phoneticians. We evaluated the performance of our system using two encoders: one is tailored for the task, and the other is based on a state-of-the-art unsupervised representation. Results suggest our best-performing system has improved recall and F1 scores compared to previous methods on unseen data.

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
Dewey Decimal Classification:410 Linguistics
000 Computer science, knowledge & systems
Scopus Subject Areas:Social Sciences & Humanities > Language and Linguistics
Physical Sciences > Human-Computer Interaction
Physical Sciences > Signal Processing
Physical Sciences > Software
Physical Sciences > Modeling and Simulation
Language:English
Event End Date:22 September 2022
Deposited On:24 Apr 2024 08:00
Last Modified:25 Apr 2024 20:00
OA Status:Green
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.21437/interspeech.2022-10756
Download PDF  'DeepFry: Identifying Vocal Fry Using Deep Neural Networks'.
Preview
  • Content: Published Version
  • Language: English

Metadata Export

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

5 downloads since deposited on 24 Apr 2024
5 downloads since 12 months
Detailed statistics

Authors, Affiliations, Collaborations

Similar Publications