Abstract
This paper presents speaker-independent isolated digit recognition experiments based on cochlear image maps that were computed using spatio-temporal spike patterns obtained from an Address-Event Representation silicon cochlea. The cochlear maps used in this study were computed by means of: (i) time-binned spike-counts; (ii) low-pass filtered spike trains; and (iii) Radon spike-count method. These maps were subsequently used as input to a back-end classifier of Support Vector Machines. The results show promising recognition accuracies on near 110 speakers from the TIDIGITS database. In fact, it is shown that despite the limited input dynamic range and the un-modelled nonlinearities produced by the hardware cochlea, the discriminative information present in its spike patterns can potentially be sufficient for a task as complex as speaker-independent isolated keyword recognition. Results report over 95% average word recognition accuracy on utterances by an unseen set of speakers.