Header

UZH-Logo

Maintenance Infos

A Low Power, Fully Event-Based Gesture Recognition System


Amir, Arnon; Taba, Brian; Berg, David; Melano, Timothy; McKinstry, Jeffrey; Nolfo, Carmelo Di; Nayak, Tapan; Andreopoulos, Alexander; Garreau, Guillaume; Mendoza, Marcela; Kusnitz, Jeff; Debole, Michael; Esser, Steve; Delbruck, Tobi; Flickner, Myron; Modha, Dharmendra (2017). A Low Power, Fully Event-Based Gesture Recognition System. In: Computer Vision and Pattern Recognition (CVPR) 2017, Honolulu, 22 July 2017 - 25 July 2017, 7243-7252.

Abstract

We present the first gesture recognition system implemented end-to-end on event-based hardware, using a TrueNorth neurosynaptic processor to recognize hand gestures in real-time at low power from events streamed live by a Dynamic Vision Sensor (DVS). The biologically inspired DVS transmits data only when a pixel detects a change, unlike traditional frame-based cameras which sample every pixel at a fixed frame rate. This sparse, asynchronous data representation lets event-based cameras operate at much lower power than frame-based cameras. However, much of the energy efficiency is lost if, as in previous work, the event stream is interpreted by conventional synchronous processors. Here, for the first time, we process a live DVS event stream using TrueNorth, a natively event-based processor with 1 million spiking neurons. Configured here as a convolutional neural network (CNN), the TrueNorth chip identifies the onset of a gesture with a latency of 105 ms while consuming less than 200 mW. The CNN achieves 96.5% out-of-sample accuracy on a newly collected DVS dataset (DvsGesture) comprising 11 hand gesture categories from 29 subjects under 3 illumination conditions.

Abstract

We present the first gesture recognition system implemented end-to-end on event-based hardware, using a TrueNorth neurosynaptic processor to recognize hand gestures in real-time at low power from events streamed live by a Dynamic Vision Sensor (DVS). The biologically inspired DVS transmits data only when a pixel detects a change, unlike traditional frame-based cameras which sample every pixel at a fixed frame rate. This sparse, asynchronous data representation lets event-based cameras operate at much lower power than frame-based cameras. However, much of the energy efficiency is lost if, as in previous work, the event stream is interpreted by conventional synchronous processors. Here, for the first time, we process a live DVS event stream using TrueNorth, a natively event-based processor with 1 million spiking neurons. Configured here as a convolutional neural network (CNN), the TrueNorth chip identifies the onset of a gesture with a latency of 105 ms while consuming less than 200 mW. The CNN achieves 96.5% out-of-sample accuracy on a newly collected DVS dataset (DvsGesture) comprising 11 hand gesture categories from 29 subjects under 3 illumination conditions.

Statistics

Citations

Dimensions.ai Metrics
6 citations in Web of Science®
4 citations in Scopus®
1 citation in Microsoft Academic
Google Scholar™

Altmetrics

Downloads

0 downloads since deposited on 23 Feb 2018
0 downloads since 12 months

Additional indexing

Item Type:Conference or Workshop Item (Paper), refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Neuroinformatics
Dewey Decimal Classification:570 Life sciences; biology
Language:English
Event End Date:25 July 2017
Deposited On:23 Feb 2018 09:36
Last Modified:30 Oct 2018 08:10
Publisher:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
Series Name:Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)
Number of Pages:10
OA Status:Closed
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1109/CVPR.2017.781

Download