Header

UZH-Logo

Maintenance Infos

Combining Deep Neural Networks and Beamforming for Real-Time Multi-Channel Speech Enhancement using a Wireless Acoustic Sensor Network


Ceolini, Enea; Liu, Shih-Chii (2019). Combining Deep Neural Networks and Beamforming for Real-Time Multi-Channel Speech Enhancement using a Wireless Acoustic Sensor Network. In: 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), Pittsburgh, 13 October 2019 - 16 October 2019, IEEE.

Abstract

This work presents a multi-channel speech enhancement algorithm using a neural network combined with beamforming deployed realtime on a wireless acoustic sensor network (WASN) of distributed microphones. We combine spectral mask estimation via a deep neural network together with spatial filtering to obtain a robust speech enhancement system even in difficult real-world scenarios (e.g. speech in noise, reverberant environments). Although the model is trained on simulated data, it performs comparably well on real-world tasks relative to an ideal oracle beamformer. We show that the model can be deployed on a WASN platform that allows for remote placement of microphones and on-board computing. We consider models with a small parameter count and low computational complexity. It achieves signal-to-distortion ratio (SDR) improvements of up to 10dB in a real-world scenario and runs real-time on-board the WASN, with a latency in the order of hundreds of milliseconds.

Abstract

This work presents a multi-channel speech enhancement algorithm using a neural network combined with beamforming deployed realtime on a wireless acoustic sensor network (WASN) of distributed microphones. We combine spectral mask estimation via a deep neural network together with spatial filtering to obtain a robust speech enhancement system even in difficult real-world scenarios (e.g. speech in noise, reverberant environments). Although the model is trained on simulated data, it performs comparably well on real-world tasks relative to an ideal oracle beamformer. We show that the model can be deployed on a WASN platform that allows for remote placement of microphones and on-board computing. We consider models with a small parameter count and low computational complexity. It achieves signal-to-distortion ratio (SDR) improvements of up to 10dB in a real-world scenario and runs real-time on-board the WASN, with a latency in the order of hundreds of milliseconds.

Statistics

Citations

Dimensions.ai Metrics
3 citations in Web of Science®
4 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

93 downloads since deposited on 12 Feb 2020
78 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Neuroinformatics
Dewey Decimal Classification:570 Life sciences; biology
Scopus Subject Areas:Physical Sciences > Human-Computer Interaction
Physical Sciences > Signal Processing
Language:English
Event End Date:16 October 2019
Deposited On:12 Feb 2020 09:48
Last Modified:15 Jun 2022 07:18
Publisher:IEEE
ISBN:9781728108247
Additional Information:© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
OA Status:Green
Publisher DOI:https://doi.org/10.1109/mlsp.2019.8918787

Download

Green Open Access