Header

UZH-Logo

Maintenance Infos

Event-driven Pipeline for Low-latency Low-compute Keyword Spotting and Speaker Verification System


Ceolini, Enea; Anumula, Jithendar; Braun, Stefan; Liu, Shih-Chii (2019). Event-driven Pipeline for Low-latency Low-compute Keyword Spotting and Speaker Verification System. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 12 May 2019 - 17 May 2019.

Abstract

This work presents an event-driven acoustic sensor processing pipeline to power a low-resource voice-activated smart assistant. The pipeline includes four major steps; namely localization, source separation, keyword spotting (KWS) and speaker verification (SV). The pipeline is driven by a front-end binaural spiking silicon cochlea sensor. The timing information carried by the output spikes of the cochlea provide spatial cues for localization and source separation. Spike features are generated with low latencies from the separated source spikes and are used by both KWS and SV which rely on state-of-the-art deep recurrent neural network architectures with a small memory footprint. Evaluation on a self-recorded event dataset based on TIDIGITS shows accuracies of over 93% and 88% on KWS and SV respectively, with minimum system latency of 5 ms on a limited resource device.

Abstract

This work presents an event-driven acoustic sensor processing pipeline to power a low-resource voice-activated smart assistant. The pipeline includes four major steps; namely localization, source separation, keyword spotting (KWS) and speaker verification (SV). The pipeline is driven by a front-end binaural spiking silicon cochlea sensor. The timing information carried by the output spikes of the cochlea provide spatial cues for localization and source separation. Spike features are generated with low latencies from the separated source spikes and are used by both KWS and SV which rely on state-of-the-art deep recurrent neural network architectures with a small memory footprint. Evaluation on a self-recorded event dataset based on TIDIGITS shows accuracies of over 93% and 88% on KWS and SV respectively, with minimum system latency of 5 ms on a limited resource device.

Statistics

Citations

Altmetrics

Downloads

0 downloads since deposited on 11 Feb 2020
0 downloads since 12 months

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Neuroinformatics
Dewey Decimal Classification:570 Life sciences; biology
Language:English
Event End Date:17 May 2019
Deposited On:11 Feb 2020 15:15
Last Modified:16 Feb 2020 07:07
Publisher:IEEE
ISBN:9781479981311
OA Status:Closed
Publisher DOI:https://doi.org/10.1109/icassp.2019.8683669

Download

Closed Access: Download allowed only for UZH members