Selected Publications

Despite impressive results in visual-inertial state estimation in recent years, high speed trajectories with six degree of freedom motion remain challenging for existing estimation algorithms. Aggressive trajectories feature large accelerations and rapid rotational motions, and when they pass close to objects in the environment, this induces large apparent motions in the vision sensors, all of which increase the difficulty in estimation. Existing benchmark datasets do not address these types of trajectories, instead focusing on slow speed or constrained trajectories, targeting other tasks such as inspection or driving. We introduce the UZH-FPV Drone Racing dataset, consisting of over 27 sequences, with more than 10 km of flight distance, captured on a first-person-view (FPV) racing quadrotor flown by an expert pilot. The dataset features camera images, inertial measurements, event-camera data, and precise ground truth poses. These sequences are faster and more challenging, in terms of apparent scene motion, than any existing dataset. Our goal is to enable advancement of the state of the art in aggressive motion estimation by providing a dataset that is beyond the capabilities of existing state estimation algorithms.
In ICRA’19

Event cameras are novel bio-inspired vision sensors that output pixel-level intensity changes, called ‘events’, instead of traditional video images. These asynchronous sensors naturally respond to motion in the scene with very low latency (in the order of microseconds) and have a very high dynamic range. These features, along with a very low power consumption, make event cameras an ideal sensor for fast robot localization and wearable applications, such as AR/VR and gaming. Considering these applications, we present a method to track the 6-DOF pose of an event camera in a known environment, which we contemplate to be described by a photometric 3D map (i.e., intensity plus depth information) built via classic dense 3D reconstruction algorithms. Our approach uses the raw events, directly, without intermediate features, within a maximum-likelihood framework to estimate the camera motion that best explains the events via a generative model. We successfully evaluate the method using both simulated and real data, and show improved results over the state of the art. We release the datasets to the public to foster reproducibility and research in this topic.
In ICRA’19

Event cameras measure changes of intensity asynchronously, in the form of a stream of events, which encode per-pixel brightness changes. In the last few years, their outstanding properties (asynchronous sensing, no motion blur, high dynamic range) have led to exciting vision applications, with very low-latency and high robustness. However, these sensors are still scarce and expensive to get, slowing down progress of the research community. To address these issues, there is a huge demand for cheap, high-quality synthetic, labeled event for algorithm prototyping, deep learning and algorithm benchmarking. The development of such a simulator, however, is not trivial since event cameras work fundamentally differently from frame-based cameras. We present the first event camera simulator that can generate a large amount of reliable event data. The key component of our simulator is a theoretically sound, adaptive rendering scheme that only samples frames when necessary, through a tight coupling between the rendering engine and the event simulator. We release ESIM as open source.
In CoRL’18

We present a method that leverages the complementarity of event cameras and standard cameras to track visual features with low latency. Event cameras are novel sensors that output pixel-level brightness changes, called ‘events’. They offer significant advantages over standard cameras, namely a very high dynamic range, no motion blur, and a latency in the order of microseconds. However, because the same scene pattern can produce different events depending on the motion direction, establishing event correspondences across time is challenging. By contrast, standard cameras provide intensity measurements (frames) that do not depend on motion direction. Our method extracts features on frames and subsequently tracks them asynchronously using events, thereby exploiting the best of both types of data: the frames provide a photometric representation that does not depend on motion direction and the events provide low latency updates. In contrast to previous works, which are based on heuristics, this is the first principled method that uses raw intensity measurements directly, based on a generative event model within a maximum-likelihood framework. As a result, our method produces feature tracks that are both more accurate (subpixel accuracy) and longer than the state of the art, across a wide variety of scenes.
In ECCV’18

Event cameras are bio-inspired sensors that offer several advantages, such as low latency, high-speed and high dynamic range, to tackle challenging scenarios in computer vision. This paper presents a solution to the problem of 3D reconstruction from data captured by a stereo event-camera rig moving in a static scene, such as in the context of stereo Simultaneous Localization and Mapping. The proposed method consists of the optimization of an energy function designed to exploit small-baseline spatio-temporal consistency of events triggered across both stereo image planes. To improve the density of the reconstruction and to reduce the uncertainty of the estimation, a probabilistic depth-fusion strategy is also developed. The resulting method has no special requirements on either the motion of the stereo event-camera rig or on prior knowledge about the scene. Experiments demonstrate our method can deal with both texture-rich scenes as well as sparse scenes, outperforming state-of-the-art stereo methods based on event data image representations.
In ECCV’18

We present a unifying framework to solve several computer vision problems with event cameras: motion, depth and optical flow estimation. The main idea of our framework is to find the point trajectories on the image plane that are best aligned with the event data by maximizing an objective function: the contrast of an image of warped events. Our method implicitly handles data association between the events, and therefore, does not rely on additional appearance information about the scene. The proposed method is not only simple, but more importantly, it is, to the best of our knowledge, the first method that can be successfully applied to such a diverse set of important vision tasks with event cameras.
In CVPR’18

In this paper, we introduce the problem of Event-based Multi-View Stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our algorithm is able to produce accurate, semi-dense depth maps and is computationally very efficient (runs in real-time on a CPU or even a smartphone processor).
In IJCV’17

In contrast to standard cameras, which produce frames at a fixed rate, event cameras respond asynchronously to pixel-level brightness changes, thus enabling the design of new algorithms for high-speed applications with latencies of microseconds. However, this advantage comes at a cost: because the output is composed by a sequence of events, traditional computer-vision algorithms are not applicable, so that a new paradigm shift is needed. We present an event-based approach for ego-motion estimation, which provides pose updates upon the arrival of each event, thus virtually eliminating latency. Our method is the first work addressing and demonstrating event-based pose tracking in six degrees-of-freedom (DOF) motions in realistic and natural scenes, and it is able to track high-speed motions. The method is successfully evaluated in both indoor and outdoor scenes.
In PAMI’17

In this paper, we present the first state estimation pipeline that leverages the complementary advantages of a standard camera and an event camera by fusing, in a tightly-coupled manner events, standard frames, and inertial measurements. Furthermore, we use our pipeline to demonstrate - to the best of our knowledge - the first autonomous quadrotor flight using an event camera for state estimation, unlocking flight scenarios that were not reachable with traditional visual-inertial odometry, such as low-light environments and high-dynamic range scenes.
In arXiv

We propose a novel, accurate tightly-coupled visual-inertial odometry pipeline for such cameras that leverages the outstanding properties of event cameras to estimate the camera ego-motion in challenging conditions, such as high-speed motion or high dynamic range scenes. The method tracks a set of features (extracted on the image plane) through time. To achieve that, we consider events in overlapping spatio-temporal windows and align them using the current camera motion and scene structure, yielding motion-compensated event frames. We then combine these feature tracks in a keyframe-based, visual-inertial odometry algorithm based on nonlinear optimization to estimate the camera’s 6-DOF pose and velocity.
In BMVC’17

In this paper, we introduce the problem of Event-based Multi-View Stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our algorithm is able to produce accurate, semi-dense depth maps and is computationally very efficient (runs in real-time on a CPU or even a smartphone processor).
In RA-L’17

This presents the world’s first collection of datasets with an event-based camera for high-speed robotics. The data also include intensity images, inertial measurements, and ground truth from a motion-capture system. An event-based camera is a revolutionary vision sensor with three key advantages: a measurement rate that is almost 1 million times faster than standard cameras, a latency of 1 microsecond, and a high dynamic range of 130 decibels (standard cameras only have 60 dB). These properties enable the design of a new class of algorithms for high-speed robotics, where standard cameras suffer from motion blur and high latency. All the data are released both as text files and binary (i.e., rosbag) files.
In IJRR’17

In this paper, we introduce the problem of Event-based Multi-View Stereo (EMVS) for event cameras and propose a solution to it. Unlike traditional MVS methods, which address the problem of estimating dense 3D structure from a set of known viewpoints, EMVS estimates semi-dense 3D structure from an event camera with known trajectory. Our algorithm is able to produce accurate, semi-dense depth maps and is computationally very efficient (runs in real-time on a CPU or even a smartphone processor).
In BMVC’16

The transition of visual-odometry technology from research demonstrators to commercial applications naturally raises the question: “what is the optimal camera for vision-based motion estimation?” This question is crucial as the choice of camera has a tremendous impact on the robustness and accuracy of the employed visual odometry algorithm. While many properties of a camera (e.g. resolution, frame-rate, global-shutter/rolling-shutter) could be considered, in this work we focus on evaluating the impact of the camera field-of-view (FoV) and optics (i.e., fisheye or catadioptric) on the quality of the motion estimate. Since the motion-estimation performance depends highly on the geometry of the scene and the motion of the camera, we analyze two common operational environments in mobile robotics: an urban environment and an indoor scene.
In ICRA’16

Recent Publications

More Publications

Recent Posts

More Posts

Our paper A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth and Optical Flow Estimation (together with Guillermo Gallego and Davide Scaramuzza) was accepted for spotlight presentation at CVPR’18!

CONTINUE READING

My paper EMVS: Event-Based Multi-View Stereo - 3D Reconstruction with an Event Camera in Real-Time about semi-dense 3D reconstruction with an event camera has been accepted to the International Journal of Computer Vision!

This work is the first to show that event cameras can be used to provide accurate, semi-dense 3D maps of a given environment, without explicitly trying to solve data association. You can watch the video here!

CONTINUE READING

I am happy to announce today that my team achieved the first ever closed-loop autonomous flight using an event camera for state estimation! Watch the video here! This achievement is the product of several years of research, and I am very proud of the result. Thanks to the event camera, out quadrotor can “see” in high-speed, even in dark environments. The algorithm running onboard the quadrotor is largely based on my recent paper: Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization, which we extended to use standard frames as an additional sensing modality in a following paper: Hybrid, Frame and Event based Visual Inertial Odometry for Robust, Autonomous Navigation of Quadrotors.

CONTINUE READING

Our paper Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization about visual-inertial odometry using an event camera has been accepted at BMVC’17 for oral presentation (acceptance rate: 5.6 %)!

You can watch the video here!

CONTINUE READING

Our paper EVO: A Geometric Approach to Event-based 6-DOF Parallel Tracking and Mapping in Real-time has been accepted for publication in the Robotics and Automation Letters (RA-L), and for presentation at ICRA’17!

CONTINUE READING

Teaching

I am a teaching assistant for the course Vision Algorithms for Mobile Robotics given at ETH Zürich.

I also occasionally supervise student projects. The list of projects currently available can be found here.

Contact