Publications

Year: Author:

Paul Voigtlaender, Bastian Leibe
BMVC 2017 Oral

We tackle the task of semi-supervised video object segmentation, i.e. segmenting the pixels belonging to an object in the video using the ground truth pixel mask for the first frame. We build on the recently introduced one-shot video object segmentation (OSVOS) approach which uses a pretrained network and fine-tunes it on the first frame. While achieving impressive performance, at test time OSVOS uses the fine-tuned network in unchanged form and is not able to adapt to large changes in object appearance. To overcome this limitation, we propose Online Adaptive Video Object Segmentation (OnAVOS) which updates the network online using training examples selected based on the confidence of the network and the spatial configuration. Additionally, we add a pretraining step based on objectness, which is learned on PASCAL. Our experiments show that both extensions are highly effective and improve the state of the art on DAVIS to an intersection-over-union score of 85.7%.

» Show BibTeX

@inproceedings{voigtlaender17BMVC,
author = {Paul Voigtlaender and Bastian Leibe},
title = {Online Adaptation of Convolutional Neural Networks for Video Object Segmentation},
booktitle = {BMVC},
year = {2017}
}






Manish Mandad, David Cohen-Steiner, Leif Kobbelt, Pierre Alliez, Mathieu Desbrun
SIGGRAPH 2017

We introduce an efficient computational method for generating dense and low distortion maps between two arbitrary surfaces of same genus. Instead of relying on semantic correspondences or surface parameterization, we directly optimize a variance-minimizing transport plan between two input surfaces that defines an as-conformal-as-possible inter-surface map satisfying a user-prescribed bound on area distortion. The transport plan is computed via two alternating convex optimizations, and is shown to minimize a generalized Dirichlet energy of both the map and its inverse. Computational efficiency is achieved through a coarse-to-fine approach in diffusion geometry, with Sinkhorn iterations modified to enforce bounded area distortion. The resulting inter-surface mapping algorithm applies to arbitrary shapes robustly, with little to no user interaction.

» Show BibTeX

@article{Mandad:2017:Mapping,
author = "Mandad, Manish and Cohen-Steiner, David and Kobbelt, Leif and Alliez, Pierre and Desbrun, Mathieu",
title = "Variance-Minimizing Transport Plans for Inter-surface Mapping",
journal = "ACM Transactions on Graphics",
volume = 36,
number = 4,
year = 2017,
articleno = {39},
}






Dan Koschier, Jan Bender, Nils Thuerey
ACM Transactions on Graphics (SIGGRAPH 2017)

In this paper we present a robust remeshing-free cutting algorithm on the basis of the eXtended Finite Element Method (XFEM) and fully implicit time integration. One of the most crucial points of the XFEM is that integrals over discontinuous polynomials have to be computed on subdomains of the polyhedral elements. Most existing approaches construct a cut-aligned auxiliary mesh for integration. In contrast, we propose a cutting algorithm that includes the construction of specialized quadrature rules for each dissected element without the requirement to explicitly represent the arising subdomains. Moreover, we solve the problem of ill-conditioned or even numerically singular solver matrices during time integration using a novel algorithm that constrains non-contributing degrees of freedom (DOFs) and introduce a preconditioner that efficiently reuses the constructed quadrature weights. Our method is particularly suitable for fine structural cutting as it decouples the added number of DOFs from the cut's geometry and correctly preserves geometry and physical properties by accurate integration. Due to the implicit time integration these fine features can still be simulated robustly using large time steps. As opposed to this, the vast majority of existing approaches either use remeshing or element duplication. Remeshing based methods are able to correctly preserve physical quantities but strongly couple cut geometry and mesh resolution leading to an unnecessary large number of additional DOFs. Element duplication based approaches keep the number of additional DOFs small but fail at correct conservation of mass and stiffness properties. We verify consistency and robustness of our approach on simple and reproducible academic examples while stability and applicability are demonstrated in large scenarios with complex and fine structural cutting.

» Show BibTeX

@ARTICLE{ Koschier2017,
author= {Dan Koschier and Jan Bender and Nils Thuerey},
title= {{Robust eXtended Finite Elements for Complex Cutting of Deformables}},
year= {2017},
journal= {Transaction on Graphics (SIGGRAPH)},
publisher= {ACM},
volume = {36},
number = {4},
pages= {12}
}






Tobias Pohlen, Alexander Hermans, Markus Mathias, Bastian Leibe
Conference on Computer Vision and Pattern Recognition (CVPR'17) Oral

Semantic image segmentation is an essential component of modern autonomous driving systems, as an accurate understanding of the surrounding scene is crucial to navigation and action planning. Current state-of-the-art approaches in semantic image segmentation rely on pre-trained networks that were initially developed for classifying images as a whole. While these networks exhibit outstanding recognition performance (i.e., what is visible?), they lack localization accuracy (i.e., where precisely is something located?). Therefore, additional processing steps have to be performed in order to obtain pixel-accurate segmentation masks at the full image resolution. To alleviate this problem we propose a novel ResNet-like architecture that exhibits strong localization and recognition performance. We combine multi-scale context with pixel-level accuracy by using two processing streams within our network: One stream carries information at the full image resolution, enabling precise adherence to segment boundaries. The other stream undergoes a sequence of pooling operations to obtain robust features for recognition. The two streams are coupled at the full image resolution using residuals. Without additional processing steps and without pre-training, our approach achieves an intersection-over-union score of 71.8% on the Cityscapes dataset.

» Show BibTeX

@inproceedings{Pohlen2017CVPR,
title = {{Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes}},
author = {Pohlen, Tobias and Hermans, Alexander and Mathias, Markus and Leibe, Bastian},
booktitle = {{IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17)}},
year = {2017}
}






Yevhen Kuznietsov, Jörg Stückler, Bastian Leibe
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'17), Spotlight

Supervised deep learning often suffers from the lack of sufficient training data. Specifically in the context of monocular depth map prediction, it is barely possible to determine dense ground truth depth images in realistic dynamic outdoor environments. When using LiDAR sensors, for instance, noise is present in the distance measurements, the calibration between sensors cannot be perfect, and the measurements are typically much sparser than the camera images. In this paper, we propose a novel approach to depth map prediction from monocular images that learns in a semi-supervised way. While we use sparse ground-truth depth for supervised learning, we also enforce our deep network to produce photoconsistent dense depth maps in a stereo setup using a direct image alignment loss. In experiments we demonstrate superior performance in depth map prediction from single images compared to the state-of-the-art methods.

» Show BibTeX

@inproceedings{kuznietsov2017_semsupdepth,
title = {Semi-Supervised Deep Learning for Monocular Depth Map Prediction},
author = {Kuznietsov, Yevhen and St\"uckler, J\"org and Leibe, Bastian},
booktitle = {IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2017}
}






Aljoša Ošep, Wolfgang Mehner, Markus Mathias, Bastian Leibe
IEEE Int. Conference on Robotics and Automation (ICRA'17)

Tracking in urban street scenes plays a central role in autonomous systems such as self-driving cars. Most of the current vision-based tracking methods perform tracking in the image domain. Other approaches, e.g. based on LIDAR and radar, track purely in 3D. While some vision-based tracking methods invoke 3D information in parts of their pipeline, and some 3D-based methods utilize image-based information in components of their approach, we propose to use image- and world-space information jointly throughout our method. We present our tracking pipeline as a 3D extension of image-based tracking. From enhancing the detections with 3D measurements to the reported positions of every tracked object, we use world- space 3D information at every stage of processing. We accomplish this by our novel coupled 2D-3D Kalman filter, combined with a conceptually clean and extendable hypothesize-and-select framework. Our approach matches the current state-of-the-art on the official KITTI benchmark, which performs evaluation in the 2D image domain only. Further experiments show significant improvements in 3D localization precision by enabling our coupled 2D-3D tracking.

» Show BibTeX

@inproceedings{Osep17ICRA,
title={Combined Image- and World-Space Tracking in Traffic Scenes},
author={O\v{s}ep, Aljo\v{s}a and Mehner, Wolfgang and Mathias, Markus and Leibe, Bastian},
booktitle={ICRA},
year={2017}
}






Dan Koschier, Crispin Deul, Magnus Brand, Jan Bender
IEEE Transactions on Visualization and Computer Graphics

In this paper we present an hp-adaptive algorithm to generate discrete higher-order polynomial Signed Distance Fields (SDFs) on axis-aligned hexahedral grids from manifold polygonal input meshes. Using an orthonormal polynomial basis, we efficiently fit the polynomials to the underlying signed distance function on each cell. The proposed error-driven construction algorithm is globally adaptive and iteratively refines the SDFs using either spatial subdivision (h-refinement) following an octree scheme or by cell-wise adaption of the polynomial approximation's degree (p-refinement). We further introduce a novel decision criterion based on an error-estimator in order to decide whether to apply p- or h-refinement. We demonstrate that our method is able to construct more accurate SDFs at significantly lower memory consumption compared to previous approaches. While the cell-wise polynomial approximation will result in highly accurate SDFs, it can not be guaranteed that the piecewise approximation is continuous over cell interfaces. Therefore, we propose an optimization-based post-processing step in order to weakly enforce continuity. Finally, we apply our generated SDFs as collision detector to the physically-based simulation of geometrically highly complex solid objects in order to demonstrate the practical relevance and applicability of our method.

» Show BibTeX

@Article{KDBB17,
author = {Koschier, Dan and Deul, Crispin and Brand, Magnus and Bender, Jan},
title = {An hp-Adaptive Discretization Algorithm for Signed Distance Field Generation},
journal = {IEEE Transactions on Visualization and Computer Graphics},
year = {2017},
volume = {23},
number = {10},
pages = {1--14},
issn = {1077-2626},
doi = {10.1109/TVCG.2017.2730202}
}






Lingni Ma, Jörg Stückler, Christian Kerl, Daniel Cremers
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'17), to appear

Visual scene understanding is an important capability that enables robots to purposefully act in their environment. In this paper, we propose a novel deep neural network approach to predict semantic segmentation from RGB-D sequences. The key innovation is to train our network to predict multi-view consistent semantics in a self-supervised way. At test time, its semantics predictions can be fused more consistently in semantic keyframe maps than predictions of a network trained on individual views. We base our network architecture on a recent single-view deep learning approach to RGB and depth fusion for semantic object-class segmentation and enhance it with multi-scale loss minimization. We obtain the camera trajectory using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth annotated frames in order to enforce multi-view consistency during training. At test time, predictions from multiple views are fused into keyframes. We propose and analyze several methods for enforcing multi-view consistency during training and testing. We evaluate the benefit of multi-view consistency training and demonstrate that pooling of deep features and fusion over multiple views outperforms single-view baselines on the NYUDv2 benchmark for semantic segmentation. Our end-to-end trained network achieves state-of-the-art performance on the NYUDv2 dataset in single-view segmentation as well as multi-view semantic fusion.

» Show BibTeX

@string{iros="International Conference on Intelligent Robots and Systems (IROS)"}
@InProceedings{lingni17iros,
author = "Lingni Ma and J\"org St\"uckler and Christian Kerl and Daniel Cremers",
title = "Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras",
booktitle = "IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS)",
year = "2017",
}






Anton Kasyanov, Francis Engelmann, Jörg Stückler, Bastian Leibe
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'17), to appear

Complementing images with inertial measurements has become one of the most popular approaches to achieve highly accurate and robust real-time camera pose tracking. In this paper, we present a keyframe-based approach to visual-inertial simultaneous localization and mapping (SLAM) for monocular and stereo cameras. Our method is based on a real-time capable visual-inertial odometry method that provides locally consistent trajectory and map estimates. We achieve global consistency in the estimate through online loop-closing and non-linear optimization. Furthermore, our approach supports relocalization in a map that has been previously obtained and allows for continued SLAM operation. We evaluate our approach in terms of accuracy, relocalization capability and run-time efficiency on public benchmark datasets and on newly recorded sequences. We demonstrate state-of-the-art performance of our approach towards a visual-inertial odometry method in recovering the trajectory of the camera.

» Show BibTeX

@article{Kasyanov2017_VISLAM,
title={{Keyframe-Based Visual-Inertial Online SLAM with Relocalization}},
author={Anton Kasyanov andFrancis Engelmann and J\"org St\"uckler and Bastian Leibe},
journal={ArXiv e-rpints:1702.02175},
year={2017}
}






Francis Engelmann, Jörg Stückler, Bastian Leibe
IEEE Winter Conference on Applications of Computer Vision (WACV'17), to appear.

Inferring the pose and shape of vehicles in 3D from a movable platform still remains a challenging task due to the projective sensing principle of cameras, difficult surface properties, e.g. reflections or transparency, and illumination changes between images. In this paper, we propose to use 3D shape and motion priors to regularize the estimation of the trajectory and the shape of vehicles in sequences of stereo images. We represent shapes by 3D signed distance functions and embed them in a low-dimensional manifold. Our optimization method allows for imposing a common shape across all image observations along an object track. We employ a motion model to regularize the trajectory to plausible object motions. We evaluate our method on the KITTI dataset and show state-of-the-art results in terms of shape reconstruction and pose estimation accuracy.





Jan Bender, Dan Koschier
IEEE Transactions on Visualization and Computer Graphics

In this paper we present a novel Smoo­thed Particle Hy­dro­dy­na­mics (SPH) method for the efficient and stable simulation of incompressible fluids. The most efficient SPH-based approaches enforce incompressibility either on position or velocity level. However, the continuity equation for incompressible flow demands to maintain a constant density and a divergence-free velocity field. We propose a combination of two novel implicit pressure solvers enforcing both a low volume compression as well as a divergence-free velocity field. While a compression-free fluid is essential for realistic physical behavior, a divergence-free velocity field drastically reduces the number of required solver iterations and increases the stability of the simulation significantly. Thanks to the improved stability, our method can handle larger time steps than previous approaches. This results in a substantial performance gain since the computationally expensive neighborhood search has to be performed less frequently. Moreover, we introduce a third optional implicit solver to simulate highly viscous fluids which seamlessly integrates into our solver framework. Our implicit viscosity solver produces realistic results while introducing almost no numerical damping. We demonstrate the efficiency, robustness and scalability of our method in a variety of complex simulations including scenarios with millions of turbulent particles or highly viscous materials.

» Show BibTeX

@article{Bender2017,
author = {Jan Bender and Dan Koschier},
title = {Divergence-Free SPH for Incompressible and Viscous Fluids},
year = {2017},
journal = {IEEE Transactions on Visualization and Computer Graphics},
publisher = {IEEE},
year={2017},
volume={23},
number={3},
pages={1193-1206},
keywords={Smoothed Particle Hydrodynamics;divergence-free fluids;fluid simulation;implicit integration;incompressibility;viscous fluids},
doi={10.1109/TVCG.2016.2578335},
ISSN={1077-2626}
}






André Calero Valdez, Sascha Gebhardt, Torsten Wolfgang Kuhlen, Martina Ziefle
International Conference on Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management 2017

Understanding multi-dimensional data and in particular multi-dimensional dependencies is hard. Information visualization can help to understand this type of data. Still, the problem of how users gain insights from such visualizations is not well understood. Both the visualizations and the users play a role in understanding the data. In a case study, using both, a scatterplot matrix and a HyperSlice with six-dimensional data, we asked 16 participants to think aloud and measured insights during the process of analyzing the data. The amount of insights was strongly correlated with spatial abilities. Interestingly, all users were able to complete an optimization task independently of self-reported understanding of the data.

» Show BibTeX

@Inbook{CaleroValdez2017,
author="Calero Valdez, Andr{\'e}
and Gebhardt, Sascha
and Kuhlen, Torsten W.
and Ziefle, Martina",
editor="Duffy, Vincent G.",
title="Measuring Insight into Multi-dimensional Data from a Combination of a Scatterplot Matrix and a HyperSlice Visualization",
bookTitle="Digital Human Modeling. Applications in Health, Safety, Ergonomics, and Risk Management: Health and Safety: 8th International Conference, DHM 2017, Held as Part of HCI International 2017, Vancouver, BC, Canada, July 9-14, 2017, Proceedings, Part II",
year="2017",
publisher="Springer International Publishing",
address="Cham",
pages="225--236",
isbn="978-3-319-58466-9",
doi="10.1007/978-3-319-58466-9_21",
url="http://dx.doi.org/10.1007/978-3-319-58466-9_21"
}






Claudia Hänel, Ali Can Demiralp, Markus Axer, David Gräßel, Bernd Hentschel, Torsten Wolfgang Kuhlen
19th EG/VGTC Conference on Visualization (EuroVis 2017)

3D-Polarized Light Imaging (3D-PLI) provides data that enables an exploration of brain fibers at very high resolution. However, the visualization poses several challenges. Beside the huge data set sizes, users have to visually perceive the pure amount of information which might be, among other aspects, inhibited for inner structures because of occlusion by outer layers of the brain. We propose a clustering of fiber directions by means of spherical harmonics using a level-of-detail structure by which the user can interactively choose a clustering degree according to the zoom level or details required. Furthermore, the clustering method can be used for the automatic grouping of similar spherical harmonics automatically into one representative. An optional overlay with a direct vector visualization of the 3D-PLI data provides a better anatomical context.



Honorable Mention for Best Short Paper!

» Show BibTeX

@inproceedings {Haenel2017Interactive,
booktitle = {EuroVis 2017 - Short Papers},
editor = {Barbora Kozlikova and Tobias Schreck and Thomas Wischgoll},
title = {{Interactive Level-of-Detail Visualization of 3D-Polarized Light Imaging Data Using Spherical Harmonics}},
author = {H\”anel, Claudia and Demiralp, Ali C. and Axer, Markus and Gr\”assel, David and Hentschel, Bernd and Kuhlen, Torsten W.},
year = {2017},
publisher = {The Eurographics Association},
ISBN = {978-3-03868-043-7},
DOI = {10.2312/eurovisshort.20171145}
}






Jan Bender, Dan Koschier, Tassilo Kugelstadt, Marcel Weiler
ACM SIGGRAPH / EUROGRAPHICS Symposium on Computer Animation (Best Paper Award)

In this paper we introduce a novel micropolar material model for the simulation of turbulent inviscid fluids. The governing equations are solved by using the concept of Smoothed Particle Hydrodynamics (SPH). As already investigated in previous works, SPH fluid simulations suffer from numerical diffusion which leads to a lower vorticity, a loss in turbulent details and finally in less realistic results. To solve this problem we propose a micropolar fluid model. The micropolar fluid model is a generalization of the classical Navier-Stokes equations, which are typically used in computer graphics to simulate fluids. In contrast to the classical Navier-Stokes model, micropolar fluids have a microstructure and therefore consider the rotational motion of fluid particles. In addition to the linear velocity field these fluids also have a field of microrotation which represents existing vortices and provides a source for new ones. However, classical micropolar materials are viscous and the translational and the rotational motion are coupled in a dissipative way. Since our goal is to simulate turbulent fluids, we introduce a novel modified micropolar material for inviscid fluids with a non-dissipative coupling. Our model can generate realistic turbulences, is linear and angular momentum conserving, can be easily integrated in existing SPH simulation methods and its computational overhead is negligible.

» Show BibTeX

@INPROCEEDINGS{Bender2017,
author = {Jan Bender and Dan Koschier and Tassilo Kugelstadt and Marcel Weiler},
title = {A Micropolar Material Model for Turbulent SPH Fluids},
booktitle = {Proceedings of the 2017 ACM SIGGRAPH/Eurographics Symposium on Computer
Animation},
year = {2017},
publisher = {ACM}
}






Dan Koschier, Jan Bender
ACM SIGGRAPH / EUROGRAPHICS Symposium on Computer Animation

In this paper, we present the novel concept of density maps for robust handling of static and rigid dynamic boundaries in fluid simulations based on Smoothed Particle Hydrodynamics (SPH). In contrast to the vast majority of existing approaches, we use an implicit discretization for a continuous extension of the density field throughout solid boundaries. Using the novel representation we enhance accuracy and efficiency of density and density gradient evaluations in boundary regions by computationally efficient lookups into our density maps. The map is generated in a preprocessing step and discretizes the density contribution in the boundary's near-field. In consequence of the high regularity of the continuous boundary density field, we use cubic Lagrange polynomials on a narrow-band structure of a regular grid for discretization. This strategy not only removes the necessity to sample boundary surfaces with particles but also decouples the particle size from the number of sample points required to represent the boundary. Moreover, it solves the ever-present problem of particle deficiencies near the boundary. In several comparisons we show that the representation is more accurate than particle samplings, especially for smooth curved boundaries. We further demonstrate that our approach robustly handles scenarios with highly complex boundaries and even outperforms one of the most recent sampling based techniques.

» Show BibTeX

@InProceedings{KB17,
author = {Dan Koschier and Jan Bender},
title = {Density Maps for Improved SPH Boundary Handling},
booktitle = {Proceedings of the 2017 ACM SIGGRAPH/Eurographics Symposium on Computer Animation},
year = {2017},
series = {SCA '17},
pages = {1--10},
publisher = {ACM}
}






Jan Bender, Matthias Müller, Miles Macklin
Tutorial Proceedings of Eurographics

The physically-based simulation of mechanical effects has been an important research topic in computer graphics for more than two decades. Classical methods in this field discretize Newton's second law and determine different forces to simulate various effects like stretching, shearing, and bending of deformable bodies or pressure and viscosity of fluids, to mention just a few. Given these forces, velocities and finally positions are determined by a numerical integration of the resulting accelerations. In the last years position-based simulation methods have become popular in the graphics community. In contrast to classical simulation approaches these methods compute the position changes in each simulation step directly, based on the solution of a quasi-static problem. Therefore, position-based approaches are fast, stable and controllable which make them well-suited for use in interactive environments. However, these methods are generally not as accurate as force-based methods but provide visual plausibility. Hence, the main application areas of position-based simulation are virtual reality, computer games and special effects in movies and commercials. In this tutorial we first introduce the basic concept of position-based dynamics. Then we present different solvers and compare them with the variational formulation of the implicit Euler method in connection with compliant constraints. We discuss approaches to improve the convergence of these solvers. Moreover, we show how position-based methods are applied to simulate elastic rods, cloth, volumetric deformable bodies, rigid body systems and fluids. We also demonstrate how complex effects like anisotropy or plasticity can be simulated and introduce approaches to improve the performance. Finally, we give an outlook and discuss open problems.

» Show BibTeX

@inproceedings {BMM2017,
title = "A Survey on Position Based Dynamics, 2017",
author = "Jan Bender and Matthias M{\"u}ller and Miles Macklin",
year = "2017",
booktitle = "EUROGRAPHICS 2017 Tutorials",
publisher = "Eurographics Association"
}






Lucas Beyer, Alexander Hermans, Bastian Leibe
IEEE Robotics and Automation Letters (RA-L) and IEEE Int. Conference on Robotics and Automation (ICRA'17)

TL;DR: Collected & annotated laser detection dataset. Use window around each point to cast vote on detection center.

We introduce the DROW detector, a deep learning based detector for 2D range data. Laser scanners are lighting invariant, provide accurate range data, and typically cover a large field of view, making them interesting sensors for robotics applications. So far, research on detection in laser range data has been dominated by hand-crafted features and boosted classifiers, potentially losing performance due to suboptimal design choices. We propose a Convolutional Neural Network (CNN) based detector for this task. We show how to effectively apply CNNs for detection in 2D range data, and propose a depth preprocessing step and voting scheme that significantly improve CNN performance. We demonstrate our approach on wheelchairs and walkers, obtaining state of the art detection results. Apart from the training data, none of our design choices limits the detector to these two classes, though. We provide a ROS node for our detector and release our dataset containing 464k laser scans, out of which 24k were annotated.

» Show BibTeX

@article{BeyerHermans2016RAL,
title = {{DROW: Real-Time Deep Learning based Wheelchair Detection in 2D Range Data}},
author = {Beyer*, Lucas and Hermans*, Alexander and Leibe, Bastian},
journal = {{IEEE Robotics and Automation Letters (RA-L)}},
year = {2016}
}






Paul Voigtlaender, Bastian Leibe
The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops

This paper describes our method used for the 2017 DAVIS Challenge on Video Object Segmentation [26]. The challenge’s task is to segment the pixels belonging to multiple objects in a video using the ground truth pixel masks, which are given for the first frame. We build on our recently proposed Online Adaptive Video Object Segmentation (OnAVOS) method which pretrains a convolutional neural network for objectness, fine-tunes it on the first frame, and further updates the network online while processing the video. OnAVOS selects confidently predicted foreground pixels as positive training examples and pixels, which are far away from the last assumed object position as negative examples. While OnAVOS was designed to work with a single object, we extend it to handle multiple objects by combining the predictions of multiple single-object runs. We introduce further extensions including upsampling layers which increase the output resolution. We achieved the fifth place out of 22 submissions to the competition.

» Show BibTeX

@article{voigtlaender17DAVIS,
author = {Paul Voigtlaender and Bastian Leibe},
title = {Online Adaptation of Convolutional Neural Networks for the 2017 DAVIS Challenge on Video Object Segmentation},
journal = {The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops},
year = {2017}
}






Martin Bellgardt, Sebastian Pick, Daniel Zielasko, Tom Vierjahn, Benjamin Weyers, Torsten Wolfgang Kuhlen
Workshop on Everyday Virtual Reality (WEVR)

Applications of Virtual Reality (VR) have been repeatedly explored with the goal to improve the data analysis process of users from different application domains, such as architecture and simulation sciences. Unfortunately, making VR available in professional application scenarios or even using it on a regular basis has proven to be challenging. We argue that everyday usage environments, such as office spaces, have introduced constraints that critically affect the design of interaction concepts since well-established techniques might be difficult to use. In our opinion, it is crucial to understand the impact of usage scenarios on interaction design, to successfully develop VR applications for everyday use. To substantiate our claim, we define three distinct usage scenarios in this work that primarily differ in the amount of mobility they allow for. We outline each scenario's inherent constraints but also point out opportunities that may be used to design novel, well-suited interaction techniques for different everyday usage environments. In addition, we link each scenario to a concrete application example to clarify its relevance and show how it affects interaction design.





Sebastian Freitag, Benjamin Weyers, Torsten Wolfgang Kuhlen
Proceedings of the IEEE Symposium on 3D User Interfaces (2017)

Scene visibility - the information of which parts of the scene are visible from a certain location—can be used to derive various properties of a virtual environment. For example, it enables the computation of viewpoint quality to determine the informativeness of a viewpoint, helps in constructing virtual tours, and allows to keep track of the objects a user may already have seen. However, computing visibility at runtime may be too computationally expensive for many applications, while sampling the entire scene beforehand introduces a costly precomputation step and may include many samples not needed later on.

Therefore, in this paper, we propose a novel approach to precompute visibility information based on navigation meshes, a polygonal representation of a scene’s navigable areas. We show that with only limited precomputation, high accuracy can be achieved in these areas. Furthermore, we demonstrate the usefulness of the approach by means of several applications, including viewpoint quality computation, landmark and room detection, and exploration assistance. In addition, we present a travel interface based on common visibility that we found to result in less cybersickness in a user study.

» Show BibTeX

@INPROCEEDINGS{freitag2017a,
author={Sebastian Freitag and Benjamin Weyers and Torsten W. Kuhlen},
booktitle={2017 IEEE Symposium on 3D User Interfaces (3DUI)},
title={{Efficient Approximate Computation of Scene Visibility Based on Navigation Meshes and Applications for Navigation and Scene Analysis}},
year={2017},
pages={134--143},
}






Sebastian Freitag, Clemens Löbbert, Benjamin Weyers, Torsten Wolfgang Kuhlen
Proceedings of IEEE Virtual Reality Conference 2017

Viewpoint quality estimation methods allow the determination of the most informative position in a scene. However, a single position usually cannot represent an entire scene, requiring instead a set of several viewpoints. Measuring the quality of such a set of views, however, is not trivial, and the computation of an optimal set of views is an NP-hard problem. Therefore, in this work, we propose three methods to estimate the quality of a set of views. Furthermore, we evaluate three approaches for computing an approximation to the optimal set (two of them new) regarding effectiveness and efficiency.





Sebastian Freitag, Benjamin Weyers, Torsten Wolfgang Kuhlen
Proceedings of IEEE Virtual Reality Conference 2017

The manual adjustment of travel speed to cover medium or large distances in virtual environments may increase cognitive load, and manual travel at high speeds can lead to cybersickness due to inaccurate steering. In this work, we present an approach to quickly pass regions where the environment does not change much, using automated suggestions based on the computation of common visibility. In a user study, we show that our method can reduce cybersickness when compared with manual speed control.





Daniel Zielasko, Neha Neha, Benjamin Weyers, Torsten Wolfgang Kuhlen
Proceedings of IEEE Virtual Reality Conference (2017)

The use of non-verbal vocal input (NVVI) as a hand-free trigger approach has proven to be valuable in previous work [Zielasko2015]. Nevertheless, BlowClick's original detection method is vulnerable to false positives and, thus, is limited in its potential use, e.g., together with acoustic feedback for the trigger. Therefore, we extend the existing approach by adding common machine learning methods. We found that a support vector machine (SVM) with Gaussian kernel performs best for detecting blowing with at least the same latency and more precision as before. Furthermore, we added acoustic feedback to the NVVI trigger, which increases the user's confidence. To evaluate the advanced trigger technique, we conducted a user study (n=33). The results confirm that it is a reliable trigger; alone and as part of a hands-free point-and-click interface.





Daniel Zielasko, Neha Neha, Benjamin Weyers, Torsten Wolfgang Kuhlen
Proceedings of the IEEE Symposium on 3D User Interaction (3DUI) 2017

We extended BlowClick, a NVVI metaphor for clicking, by adding machine learning methods to more reliably classify blowing events. We found a support vector machine with Gaussian kernel performing the best with at least the same latency and more precision than before. Furthermore, we added acoustic feedback to the NVVI trigger, which increases the user's confidence. With this extended technique we conducted a user study with 33 participants and could confirm that it is possible to use NVVI as a reliable trigger as part of a hands-free point-and-click interface.





Daniel Zielasko, Benjamin Weyers, Martin Bellgardt, Sebastian Pick, Alexander Meißner, Tom Vierjahn, Torsten Wolfgang Kuhlen
IEEE Virtual Reality Workshop on Everyday Virtual Reality 2017

In this work we describe the scenario of fully-immersive desktop VR, which serves the overall goal to seamlessly integrate with existing workflows and workplaces of data analysts and researchers, such that they can benefit from the gain in productivity when immersed in their data-spaces. Furthermore, we provide a literature review showing the status quo of techniques and methods available for realizing this scenario under the raised restrictions. Finally, we propose a concept of an analysis framework and the decisions made and the decisions still to be taken, to outline how the described scenario and the collected methods are feasible in a real use case.





Andrea Bönsch, Tom Vierjahn, Ari Shapiro, Torsten Wolfgang Kuhlen
IEEE Virtual Humans and Crowds for Immersive Environments (VHCIE), 2017

It is increasingly common to embed embodied, human-like, virtual agents into immersive virtual environments for either of the two use cases: (1) populating architectural scenes as anonymous members of a crowd and (2) meeting or supporting users as individual, intelligent and conversational agents. However, the new trend towards intelligent cyber physical systems inherently combines both use cases. Thus, we argue for the necessity of multiagent systems consisting of anonymous and autonomous agents, who temporarily turn into intelligent individuals. Besides purely enlivening the scene, each agent can thus be engaged into a situation-dependent interaction by the user, e.g., into a conversation or a joint task. To this end, we devise components for an agent’s behavioral design modeling the transition between an anonymous and an individual agent when a user approaches.

» Show BibTeX

@InProceedings{Boensch2017c,
Title = {{Turning Anonymous Members of a Multiagent System into Individuals}},
Author = {Andrea B\"{o}nsch, Tom Vierjahn, Ari Shapiro and Torsten W. Kuhlen},
Booktitle = {IEEE Virtual Humans and Crowds for Immersive Environments},
Year = {2017},
Abstract = {It is increasingly common to embed embodied, human-like, virtual agents into immersive virtual environments for either of the two use cases: (1) populating architectural scenes as anonymous members of a crowd and (2) meeting or supporting users as individual, intelligent and conversational agents. However, the new trend towards intelligent cyber physical systems inherently combines both use cases. Thus, we argue for the necessity of multiagent systems consisting of anonymous and autonomous agents, who temporarily turn into intelligent individuals. Besides purely enlivening the scene, each agent can thus be engaged into a situation-dependent interaction by the user, e.g., into a conversation or a joint task. To this end, we devise components for an agent’s behavioral design modeling the transition between an anonymous and an individual agent when a user approaches.},
Keywords = {Virtual Humans; Virtual Reality; Intelligent Agents; Mutliagent System},
Owner = {ab280112},
Timestamp = {2017.02.28}
}






Andrea Bönsch, Tom Vierjahn, Torsten Wolfgang Kuhlen
Proceedings of the IEEE Symposium on 3D User Interfaces 2017

Embodied, virtual agents provide users assistance in agent-based support systems. To this end, two closely linked factors have to be considered for the agents’ behavioral design: their presence time (PT), i.e., the time in which the agents are visible, and the approaching time (AT), i.e., the time span between the user’s calling for an agent and the agent’s actual availability.

This work focuses on human-like assistants that are embedded in immersive scenes but that are required only temporarily. To the best of our knowledge, guidelines for a suitable trade-off between PT and AT of these assistants do not yet exist. We address this gap by presenting the results of a controlled within-subjects study in a CAVE. While keeping a low PT so that the agent is not perceived as annoying, three strategies affecting the AT, namely fading, walking, and running, are evaluated by 40 subjects. The results indicate no clear preference for either behavior. Instead, the necessity of a better trade-off between a low AT and an agent’s realistic behavior is demonstrated.

» Show BibTeX

@InProceedings{Boensch2017b,
Title = {Evaluation of Approaching-Strategies of Temporarily Required Virtual Assistants in Immersive Environments},
Author = {Andrea B\"{o}nsch and Tom Vierjahn and Torsten W. Kuhlen},
Booktitle = {IEEE Symposium on 3D User Interfaces},
Year = {2017},
Pages = {69-72}
}






Lukas von Stumberg, Vladyslav Usenko, Jakob Engel, Jörg Stückler, Daniel Cremers
Accepted for the European Conference on Mobile Robots (ECMR), CoRR abs/1609.07835, 2017

Micro aerial vehicles (MAVs) are strongly limited in their payload and power capacity. In order to implement autonomous navigation, algorithms are therefore desirable that use sensory equipment that is as small, low-weight, and low- power consuming as possible. In this paper, we propose a method for autonomous MAV navigation and exploration using a low-cost consumer-grade quadrocopter equipped with a monocular camera. Our vision-based navigation system builds on LSD-SLAM which estimates the MAV trajectory and a semi-dense reconstruction of the environment in real-time. Since LSD-SLAM only determines depth at high gradient pixels, texture-less areas are not directly observed. We propose an obstacle mapping and exploration approach that takes this property into account. In experiments, we demonstrate our vision-based autonomous navigation and exploration system with a commercially available Parrot Bebop MAV.

» Show BibTeX

@inproceedings{stumberg2017_mavexplore,
author={Lukas von Stumberg and Vladyslav Usenko and Jakob Engel and J\"org St\"uckler and Daniel Cremers},
title={From Monoular {SLAM} to Autonomous Drone Exploration},
booktitle = {Accepted for the European Conference on Mobile Robots (ECMR)},
year = {2017},
}






Ishrat Badami, Manu Tom, Markus Mathias, Bastian Leibe
IEEE Winter Conference on Applications of Computer Vision (WACV'17).

In this paper we propose a novel approach to identify and label the structural elements of furniture e.g. wardrobes, cabinets etc. Given a furniture item, the subdivision into its structural components like doors, drawers and shelves is difficult as the number of components and their spatial arrangements varies severely. Furthermore, structural elements are primarily distinguished by their function rather than by unique color or texture based appearance features. It is therefore difficult to classify them, even if their correct spatial extent were known. In our approach we jointly estimate the number of functional units, their spatial structure, and their corresponding labels by using reversible jump MCMC (rjMCMC), a method well suited for optimization on spaces of varying dimensions (the number of structural elements). Optionally, our system permits to invoke depth information e.g. from RGB-D cameras, which are already frequently mounted on mobile robot platforms. We show a considerable improvement over a baseline method even without using depth data, and an additional performance gain when depth input is enabled.

» Show BibTeX

@inproceedings{badamiWACV17,
title={3D Semantic Segmentation of Modular Furniture using rjMCMC
},
author={Badami, Ishrat and Tom, Manu and Mathias, Markus and Leibe, Bastian},
booktitle={WACV},
year={2017}
}






Andrea Bönsch, Jonathan Wendt, Heiko Overath, Özgür Gürerk, Christine Harbring, Christian Grund, Thomas Kittsteiner, Torsten Wolfgang Kuhlen
Proceedings of IEEE Virtual Reality Conference 2017

Traditionally, experimental economics uses controlled and incentivized field and lab experiments to analyze economic behavior. However, investigating peer effects in the classic settings is challenging due to the reflection problem: Who is influencing whom?

To overcome this, we enlarge the methodological toolbox of these experiments by means of Virtual Reality. After introducing and validating a real-effort sorting task, we embed a virtual agent as peer of a human subject, who independently performs an identical sorting task. We conducted two experiments investigating (a) the subject’s productivity adjustment due to peer effects and (b) the incentive effects on competition. Our results indicate a great potential for Virtual-Reality-based economic experiments.

» Show BibTeX

@InProceedings{Boensch2017a,
Title = {Peers At Work: Economic Real-Effort Experiments In The Presence of Virtual Co-Workers},
Author = {Andrea B\"{o}nsch and Jonathan Wendt and Heiko Overath and Özgür Gürerk and Christine Harbring and Christian Grund and Thomas Kittsteiner and Torsten W. Kuhlen},
Booktitle = {IEEE Virtual Reality Conference Poster Proceedings},
Year = {2017},
Pages = {301-302}
}






Johanna Senk, Alper Yegenoglu, Olivier Amblet, Yury Brukau, Andrew Davison, David Lester, Anna Lührs, Pietro Quaglio, Vahid Rostami, Andrew Rowley, Bernd Schuller, Alan Stokes, Sacha J. Van Albada, Daniel Zielasko, Markus Diesmann, Benjamin Weyers, Michael Denker, Sonja Grün
High-Performance Scientific Computing, Januar 2017

Workflows for the acquisition and analysis of data in the natural sciences exhibit a growing degree of complexity and heterogeneity, are increasingly performed in large collaborative efforts, and often require the use of high-performance computing (HPC). Here, we explore the reasons for these new challenges and demands and discuss their impact, with a focus on the scientific domain of computational neuroscience. We argue for the need for software platforms integrating HPC systems that allow scientists to construct, comprehend and execute workflows composed of diverse processing steps using different tools. As a use case we present a concrete implementation of such a complex workflow, covering diverse topics such as HPC-based simulation using the NEST software, access to the SpiNNaker neuromorphic hardware platform, complex data analysis using the Elephant library, and interactive visualizations. Tools are embedded into a web-based software platform under development by the Human Brain Project, called Collaboratory. On the basis of this implementation, we discuss the state-of-the-art and future challenges in constructing large, collaborative workflows with access to HPC resources.




Lucas Beyer, Stefan Breuers, Vitaly Kurin, Bastian Leibe
arXiv:1705.04608

TL;DR: Explorative paper. Learn a Triplet-ReID net, embed the full image. Keep embeddings of known tracks, correlate them with image embeddings and use that as measurement model in a Bayesian filtering tracker. MOT score is mediocre, but framework is theoretically pleasing.

With the rise of end-to-end learning through deep learning, person detectors and re-identification (ReID) models have recently become very strong. Multi-camera multi-target (MCMT) tracking has not fully gone through this transformation yet. We intend to take another step in this direction by presenting a theoretically principled way of integrating ReID with tracking formulated as an optimal Bayes filter. This conveniently side-steps the need for data-association and opens up a direct path from full images to the core of the tracker. While the results are still sub-par, we believe that this new, tight integration opens many interesting research opportunities and leads the way towards full end-to-end tracking from raw pixels.

» Show BibTeX

@article{BeyerBreuers2017Arxiv,
author = {Lucas Beyer and
Stefan Breuers and
Vitaly Kurin and
Bastian Leibe},
title = {Towards a Principled Integration of Multi-Camera Re-Identification
and Tracking through Optimal Bayes Filters},
journal = {arXiv preprint arXiv:1705.04608},
year = {2017},
}






Alexander Hermans, Lucas Beyer, Bastian Leibe
arXiv:1703.07737

TL;DR: Use triplet loss, hard-mining inside mini-batch performs great, is similar to offline semi-hard mining but much more efficient.

In the past few years, the field of computer vision has gone through a revolution fueled mainly by the advent of large datasets and the adoption of deep convolutional neural networks for end-to-end learning. The person re-identification subfield is no exception to this, thanks to the notable publication of the Market-1501 and MARS datasets and several strong deep learning approaches. Unfortunately, a prevailing belief in the community seems to be that the triplet loss is inferior to using surrogate losses (classification, verification) followed by a separate metric learning step. We show that, for models trained from scratch as well as pretrained ones, using a variant of the triplet loss to perform end-to-end deep metric learning outperforms any other published method by a large margin.

» Show BibTeX

@article{HermansBeyer2017Arxiv,
title = {{In Defense of the Triplet Loss for Person Re-Identification}},
author = {Hermans*, Alexander and Beyer*, Lucas and Leibe, Bastian},
journal = {arXiv preprint arXiv:1703.07737},
year = {2017}
}






Previous Year (2016)
Disclaimer Home Visual Computing institute RWTH Aachen University