Abstract
This paper reports a study on methods for real-time speaker identification using the output from an event-based silicon cochlea. These methods are evaluated based on the amount of computation that needs to be performed and the classification performance in a speaker identification task. It uses the binaural AEREAR2 silicon cochlea, with 64 frequency channels and 512 output neurons. Auditory features representing fading histograms of inter-spike intervals and channel activity distributions are extracted from the cochlea spikes. These feature vectors are then classified by a linear Support Vector Machine, which is trained against a subset of 40 speakers (20/20 male/female) from the TIMIT database. Speakers are correctly identified at >90% accuracy during each sentence utterance and with an average latency of 700±200ms from the start of the sentence.