BACKGROUND: Speech perception is based on a variety of spectral and temporal acoustic features available in the acoustic signal. Voice-onset time (VOT) is considered an important cue that is cardinal for phonetic perception. METHODS: In the present study, we recorded and compared scalp auditory evoked potentials (AEP) in response to consonant-vowel-syllables (CV) with varying voice-onset-times (VOT) and non-speech analogues with varying noise-onset-time (NOT). In particular, we aimed to investigate the spatio-temporal pattern of acoustic feature processing underlying elemental speech perception and relate this temporal processing mechanism to specific activations of the auditory cortex. RESULTS: Results show that the characteristic AEP waveform in response to consonant-vowel-syllables is on a par with those of non-speech sounds with analogue temporal characteristics. The amplitude of the N1a and N1b component of the auditory evoked potentials significantly correlated with the duration of the VOT in CV and likewise, with the duration of the NOT in non-speech sounds.Furthermore, current density maps indicate overlapping supratemporal networks involved in the perception of both speech and non-speech sounds with a bilateral activation pattern during the N1a time window and leftward asymmetry during the N1b time window. Elaborate regional statistical analysis of the activation over the middle and posterior portion of the supratemporal plane (STP) revealed strong left lateralized responses over the middle STP for both the N1a and N1b component, and a functional leftward asymmetry over the posterior STP for the N1b component. CONCLUSION: The present data demonstrate overlapping spatio-temporal brain responses during the perception of temporal acoustic cues in both speech and non-speech sounds. Source estimation evidences a preponderant role of the left middle and posterior auditory cortex in speech and non-speech discrimination based on temporal features. Therefore, in congruency with recent fMRI studies, we suggest that similar mechanisms underlie the perception of linguistically different but acoustically equivalent auditory events on the level of basic auditory analysis.