Header

UZH-Logo

Maintenance Infos

Monaural Source Separation Using a Random Forest Classifier


Riday, Cosimo; Bhargava, Saurabh; Hahnloser, Richard H R; Liu, Shih-Chii (2016). Monaural Source Separation Using a Random Forest Classifier. In: Interspeech 2016, San Francisco, CA, USA, 8 September 2016 - 12 September 2016, 3344-3348.

Abstract

We address the problem of separating two audio sources from a single channel mixture recording. A novel method called Multi Layered Random Forest (MLRF) that learns a binary mask for both the sources is presented. Random Forest (RF) classifiers are trained for each frequency band of a source spectrogram. A specialized set of linear transformations are applied to a local time-frequency (T-F) neighborhood of the mixture that captures relevant local statistics. A sampling method is presented that efficiently samples T-F training bins in each frequency band. We draw equal numbers of dominant (more power) training samples from the two sources for RF classifiers that estimate the Ideal Binary Mask (IBM). An estimated IBM in a given layer is used to train a RF classifier in the next higher layer of the MLRF hierarchy. On average, MLRF performs better than deep Recurrent Neural Networks (RNNs) and Non-Negative Sparse Coding (NNSC) in signal-to-noise ratio (SNR) of reconstructed audio, overall T-F bin classification accuracy, as well as PESQ and STOI scores. Additionally, we demonstrate the ability of the MLRF to correctly reconstruct T-F bins of the target even when the latter has lower power in that frequency band.

Abstract

We address the problem of separating two audio sources from a single channel mixture recording. A novel method called Multi Layered Random Forest (MLRF) that learns a binary mask for both the sources is presented. Random Forest (RF) classifiers are trained for each frequency band of a source spectrogram. A specialized set of linear transformations are applied to a local time-frequency (T-F) neighborhood of the mixture that captures relevant local statistics. A sampling method is presented that efficiently samples T-F training bins in each frequency band. We draw equal numbers of dominant (more power) training samples from the two sources for RF classifiers that estimate the Ideal Binary Mask (IBM). An estimated IBM in a given layer is used to train a RF classifier in the next higher layer of the MLRF hierarchy. On average, MLRF performs better than deep Recurrent Neural Networks (RNNs) and Non-Negative Sparse Coding (NNSC) in signal-to-noise ratio (SNR) of reconstructed audio, overall T-F bin classification accuracy, as well as PESQ and STOI scores. Additionally, we demonstrate the ability of the MLRF to correctly reconstruct T-F bins of the target even when the latter has lower power in that frequency band.

Statistics

Altmetrics

Downloads

21 downloads since deposited on 27 Jan 2017
21 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Speech), refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Neuroinformatics
Dewey Decimal Classification:570 Life sciences; biology
Language:English
Event End Date:12 September 2016
Deposited On:27 Jan 2017 08:24
Last Modified:29 Aug 2017 12:07
Publisher:Proceedings of Interspeech 2016
Series Name:Proceedings of Interspeech 2016
Number of Pages:5
Free access at:Official URL. An embargo period may apply.
Publisher DOI:https://doi.org/10.21437/Interspeech.2016-252
Official URL:http://www.isca-speech.org/archive/Interspeech_2016/abstracts/0252.html

Download

Download PDF  'Monaural Source Separation Using a Random Forest Classifier'.
Preview
Filetype: PDF
Size: 313kB
View at publisher