Abstract
Voice timbre – the unique acoustic information in a voice by which its speaker can be recognized – is particularly critical in mother-infant interaction. Correct identification of vocal timbre is necessary in order for infants to recognize their mothers as familiar both before and after birth, providing a basis for social bonding between infant and mother. The exact mechanisms underlying infant voice recognition remain ambiguous and have predominantly been studied in terms of cognitive voice recognition abilities of the infant. Here, we show – for the first time – that caregivers actively maximize their chances of being correctly recognized by presenting more details of their vocal timbre through adjustments to their voices known as infant-directed speech (IDS) or baby talk, a vocal register which is wide-spread through most of the world’s cultures. Using acoustic modelling (k-means clustering of Mel Frequency Cepstral Coefficients) of IDS in comparison with adult-directed speech (ADS), we found in two cohorts of speakers - US English and Swiss German mothers - that voice timbre clusters of in IDS are significantly larger to comparable clusters in ADS. This effect leads to a more detailed representation of timbre in IDS with subsequent benefits for recognition. Critically, an automatic speaker identification using a Gaussian-mixture model based on Mel Frequency Cepstral Coefficients showed significantly better performance in two experiments when trained with IDS as opposed to ADS. We argue that IDS has evolved as part of an adaptive set of evolutionary strategies that serve to promote indexical signalling by caregivers to their offspring which thereby promote social bonding via voice and acquiring linguistic systems.