Header

UZH-Logo

Maintenance Infos

Attention-driven Multi-sensor Selection


Braun, Stefan; Neil, Daniel; Anumula, Jithendar; Ceolini, Enea; Liu, Shih-Chii (2019). Attention-driven Multi-sensor Selection. In: 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14 July 2019 - 19 July 2019, Institute of Electrical and Electronics Engineers.

Abstract

Recent encoder-decoder models for sequence-to-sequence mapping show that integrating both temporal and spatial attention mechanisms into neural networks considerably improve network performance. The use of attention for sensor selection in multi-sensor setups and the benefit of such an attention mechanism is less studied. This work reports on a sensor transformation attention network (STAN) that embeds a sensory attention mechanism to dynamically weigh and combine individual input sensors based on their task-relevant information. We demonstrate the correlation of the attentional signal to changing noise levels of each sensor on the audio-visual GRID dataset and synthetic noise; and on CHiME-4, a multi-microphone real-world noisy dataset. In addition, we demonstrate that the STAN model is able to deal with sensor removal and addition without retraining, and is invariant to channel order. Compared to a two-sensor model that weighs both sensors equally, the equivalent STAN model has a relative parameter increase of only 0.09%, but reduces the relative character error rate (CER) by up to 19.1% on the CHiME-4 dataset. The attentional signal helps to identify a lower SNR sensor with up to 94.2% accuracy.

Abstract

Recent encoder-decoder models for sequence-to-sequence mapping show that integrating both temporal and spatial attention mechanisms into neural networks considerably improve network performance. The use of attention for sensor selection in multi-sensor setups and the benefit of such an attention mechanism is less studied. This work reports on a sensor transformation attention network (STAN) that embeds a sensory attention mechanism to dynamically weigh and combine individual input sensors based on their task-relevant information. We demonstrate the correlation of the attentional signal to changing noise levels of each sensor on the audio-visual GRID dataset and synthetic noise; and on CHiME-4, a multi-microphone real-world noisy dataset. In addition, we demonstrate that the STAN model is able to deal with sensor removal and addition without retraining, and is invariant to channel order. Compared to a two-sensor model that weighs both sensors equally, the equivalent STAN model has a relative parameter increase of only 0.09%, but reduces the relative character error rate (CER) by up to 19.1% on the CHiME-4 dataset. The attentional signal helps to identify a lower SNR sensor with up to 94.2% accuracy.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

60 downloads since deposited on 11 Feb 2020
30 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Neuroinformatics
Dewey Decimal Classification:570 Life sciences; biology
Scopus Subject Areas:Physical Sciences > Software
Physical Sciences > Artificial Intelligence
Language:English
Event End Date:19 July 2019
Deposited On:11 Feb 2020 15:28
Last Modified:27 Jan 2022 01:10
Publisher:Institute of Electrical and Electronics Engineers
Series Name:Proceedings of the International Joint Conference on Neural Networks
ISSN:2161-4393
ISBN:9781728119854
Additional Information:© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
OA Status:Green
Publisher DOI:https://doi.org/10.1109/ijcnn.2019.8852396