Visual Simultaneous Localization and Mapping (SLAM) is a method that relies on visual feature tracking to estimate the camera motion while creating a map of the environment. It is crucial for autonomous navigation of robots, vehicles and drones in GNSS-denied environments (urban canyons, tunnels, indoors) and any environment with jamming / spoofing.
SLAM algorithms generally assume that features in the observed environment belong to static and rigid objects.
Thus, in crowded and dynamic environments such as urban traffic, the algorithm's performance in terms of camera motion estimation is heavily affected by the large amount of dynamic objects observed. To address this challenge, an innovative real-time method for the detection and exclusion of moving objects in the motion estimation stage of a Visual SLAM frontend is presented.
Our approach integrates an Instance Segmentation Network that generates accurate instance masks that allow the removal of dynamic object features. To overcome real-time limitations due to slow network inference, a mask propagation algorithm is introduced that makes use of sparse optical flow. The proposed algorithm runs on an embedded platform and utilizes the GPU for inference.
We implement our method on a real-vehicle, evaluate it on multiple public datasets and prove that the removal of dynamic objects leads to increased accuracy and robustness of the position solution.
This work was conducted under the EU-funded DREAM project.}
Previous Article in event
Previous Article in topic
Next Article in event
Next Article in topic
Robust real-time automotive Visual SLAM with dynamic object removal
Published:
22 September 2025
by MDPI
in European Navigation Conference 2025
topic Multi-Sensor and Autonomous Navigation
Abstract:
Keywords: Visual SLAM; Semantic SLAM; dynamic environments; autonomous navigation; instance segmentation; dynamic object removal; real-time localization
