Abstract
Traditional visual sensor technology is firmly rooted in the concept of sequences of image frames. The sequence of stroboscopic images in these "frame cameras" is very different compared to the information running from the retina to the visual cortex. While conventional cameras have improved in the direction of smaller pixels and higher frame rates, the basics of image acquisition have remained the same.
Event-based vision sensors were originally known as "silicon retinas" but are now widely called "event cameras." They are a new type of vision sensors that take inspiration from the mechanisms developed by nature for the mammalian retina and suggest a different way of perceiving the world. As in the neural system, the sensed information is encoded in a train of spikes, or so-called events, comparable to the action potential generated in the nerve.
Event-based sensors produce sparse and asynchronous output that represents in- formative changes in the scene. These sensors have advantages in terms of fast response, low latency, high dynamic range, and sparse output. All these char- acteristics are appealing for computer vision and robotic applications, increasing the interest in this kind of sensor. However, since the sensor’s output is very dif- ferent, algorithms applied for frames need to be rethought and re-adapted.
This thesis focuses on several applications of event cameras in scientific scenarios. It aims to identify where they can make the difference compared to frame cam- eras. The presented applications use the Dynamic Vision Sensor (event camera developed by the Sensors Group of the Institute of Neuroinformatics, University of Zurich and ETH).
To explore some applications in more extreme situations, the first chapters of the thesis focus on the characterization of several advanced versions of the standard DVS. The low light condition represents a challenging situation for every vision sensor. Taking inspiration from standard Complementary Metal Oxide Semiconductor (CMOS) technology, the DVS pixel performances in a low light scenario can be improved, increasing sensitivity and quantum efficiency, by using back-side illumination. This thesis characterizes the so-called Back Side Illumination DAVIS (BSI DAVIS) camera and shows results from its application in calcium imaging of neural activity. The BSI DAVIS has shown better performance in the low light scene due to its high Quantum Efficiency (QE) of 93% and proved to be the best type of technology for microscopy application. The BSI DAVIS allows detecting fast dynamic changes in neural fluorescent imaging using the green fluorescent calcium indicator GCaMP6f.
Event camera advances have pushed the exploration of event-based cameras in computer vision tasks. Chapters of this thesis focus on two of the most active research areas in computer vision: human pose estimation and hand gesture classification. Both chapters report the datasets collected to achieve the task, fulfilling the continuous need for data for this kind of new technology. The Dynamic Vision Sensor Human Pose dataset (DHP19) is an extensive collection of 33 whole-body human actions from 17 subjects. The chapter presents the first benchmark neural network model for 3D pose estimation using DHP19. The network archives a mean error of less than 8 mm in the 3D space, which is comparable with frame-based Human Pose Estimation (HPE) methods using frames. The gesture classification chapter reports an application running on a mobile device and explores future developments in the direction of embedded portable low power devices for online processing. The sparse output from the sensor suggests using a small model with a reduced number of parameters and low power consumption.
The thesis also describes pilot results from two other scientific imaging applica- tions for raindrop size measurement and laser speckle analysis presented in the appendices.