Here, we applied a multi-feature mismatch negativity (MMN) paradigm in order to systematically investigate the neuronal representation of vowels and temporally manipulated CV syllables in a homogeneous sample of string players and non-musicians. Based on previous work indicating an increased sensitivity of the musicians' auditory system, we expected to find that musically trained subjects will elicit increased MMN amplitudes in response to temporal variations in CV syllables, namely voice-onset time (VOT) and duration. In addition, since different vowels are principally distinguished by means of frequency information and musicians are superior in extracting tonal (and thus frequency) information from an acoustic stream, we also expected to provide evidence for an increased auditory representation of vowels in the experts. In line with our hypothesis, we could show that musicians are not only advantaged in the pre-attentive encoding of temporal speech cues, but most notably also of processing vowels. Additional "just noticeable difference" measurements suggested that the musicians' perceptual advantage in encoding speech sounds was more likely driven by the generic constitutional properties of a highly trained auditory system, rather than by its specialisation for speech representations per se. These results shed light on the origin of the often reported advantage of musicians in processing a variety of speech sounds.