Header

UZH-Logo

Maintenance Infos

Enhancing the objectivity of interactive formant estimation: introducing euclidean distance measure and numerical conditions for numbers and frequency ranges of formants


Kathiresan, Thayabaran; Maurer, Dieter; Dellwo, Volker (2017). Enhancing the objectivity of interactive formant estimation: introducing euclidean distance measure and numerical conditions for numbers and frequency ranges of formants. In: Trouvain, Juergen; Steiner, Ingmar; Moebius, Bernd. Elektronische Sprachsignalverarbeitung 2017. Dresden: TUDpress, 130-137.

Abstract

Current formant measurement studies of vowel sounds generally use a Linear Predictive Coding (LPC) algorithm and rely on an interactive method of formant estimation which includes a comparison of measured formant tracks and characteristics of the spectrogram. Thereby, the selection of LPC parameters is based on the assumption that the number of poles for the analysis of a given frequency range is age- and gender-specific. However, when crosschecking measured formant tracks with the spectrogram, mismatches occur in a significant number of cases. In these cases, the investigators try to minimize these mismatches by modifying the number of poles of LPC. Such an interaction is based on phonetic knowledge, analytical experience and related expectations. Several authors have pointed towards the lack of objectivity and the inherent circularity as well as the fact that similar formant estimations performed by different researchers may yield different results. As of yet, the issue of an improvement and objectification of formant estimation procedure is still a matter of debate. The present paper describes such a corresponding approach: basing the LPC pole-number selection on objective criteria by introducing Euclidean distance measure and formant frequency conditions as references for interactive formant frequency estimation. The paper further presents and discusses the results of a pilot evaluation using the method proposed on 224 long Standard German vowel sounds /i-y-e-ø-ɛ-a-o-u/ produced by eight children, ten women and ten men on fundamental frequencies of 262 Hz (children), 220 Hz (women) and 131 Hz (men), respectively.

Abstract

Current formant measurement studies of vowel sounds generally use a Linear Predictive Coding (LPC) algorithm and rely on an interactive method of formant estimation which includes a comparison of measured formant tracks and characteristics of the spectrogram. Thereby, the selection of LPC parameters is based on the assumption that the number of poles for the analysis of a given frequency range is age- and gender-specific. However, when crosschecking measured formant tracks with the spectrogram, mismatches occur in a significant number of cases. In these cases, the investigators try to minimize these mismatches by modifying the number of poles of LPC. Such an interaction is based on phonetic knowledge, analytical experience and related expectations. Several authors have pointed towards the lack of objectivity and the inherent circularity as well as the fact that similar formant estimations performed by different researchers may yield different results. As of yet, the issue of an improvement and objectification of formant estimation procedure is still a matter of debate. The present paper describes such a corresponding approach: basing the LPC pole-number selection on objective criteria by introducing Euclidean distance measure and formant frequency conditions as references for interactive formant frequency estimation. The paper further presents and discusses the results of a pilot evaluation using the method proposed on 224 long Standard German vowel sounds /i-y-e-ø-ɛ-a-o-u/ produced by eight children, ten women and ten men on fundamental frequencies of 262 Hz (children), 220 Hz (women) and 131 Hz (men), respectively.

Statistics

Altmetrics

Downloads

59 downloads since deposited on 21 Feb 2018
2 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Book Section, refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Language:English
Date:2017
Deposited On:21 Feb 2018 19:39
Last Modified:03 Dec 2020 15:19
Publisher:TUDpress
Series Name:Studientexte zur Sprachkommunikation
Number:86
ISSN:0940-6832
ISBN:978-3-95908-094-1
Funders:Swiss National Science Foundation
OA Status:Green
Free access at:Official URL. An embargo period may apply.
Official URL:http://essv2017.coli.uni-saarland.de/pdfs/Kathiresan.pdf
Related URLs:http://essv2017.coli.uni-saarland.de/program.html (Organisation)
http://essv2017.coli.uni-saarland.de/pdfs/proceedings.pdf (Publisher)
Project Information:
  • : FunderSNSF
  • : Grant ID
  • : Project TitleSwiss National Science Foundation
  • Content: Published Version