NVIDIA Isaac ROS In-Depth: cuVSLAM and the DP3.1 Release

NVIDIA's Isaac ROS software stack continues to evolve, with the DP3.1 (Developer Preview 3.1) release bringing significant improvements to its Visual SLAM (Simultaneous Localization and Mapping) capabilities. It provides a high-performance, best-in-class ROS 2 package for VSLAM on GPU and uses a stereo camera with an IMU to estimate odometry as an input to navigation. Due to its GPU acceleration, it can provide real-time, low-latency odometry in a robotics application.

Plots of the cuvslam VIO performance

Introduction to cuVSLAM

In the DP3.1 release, NVIDIA has renamed the ELBRUS VSLAM library to cuVSLAM (short for CUDA VSLAM) and bumped its major version from 10 to 11. This change signifies the improvements and enhancements that have been made to the library. cuVSLAM is designed to provide high-quality, real-time SLAM capabilities, and the latest version includes several updates that improve its performance and functionality.

Accuracy and Performance

cuVSLAM is one of the few existing pure GPU VSLAM implementations with a low translation and rotational error as measured on the KITTI Visual Odometry / SLAM Evaluation 2012 public data set. It outperforms ORB-SLAM2, a popular open-source SLAM library, in terms of both translation and rotation errors, but underperforms SOFT-SLAM and SOFT2, the current leaders in the KITTI benchmark. SOFT however has only published Matlab code, which runs on CPU, and has taken advantage of planar movements of the dataset, making it unsuitable for hand-held or 3D VSLAM.

In addition to standard benchmarks, NVIDIA tests loop closure for VSLAM on sequences of over 1000 meters, covering both indoor and outdoor scenes. This ensures that the system can handle a wide variety of environments and conditions.

Especially the run-time performance of the cuVSLAM Node is impressive across different platforms. For instance, on an AGX Orin platform, it achieves 232 frames per second (fps) at 720p resolution. On an x86_64 platform with an RTX 4060 Ti, it reaches 386 fps. While these are not your typical robot hardware architectures, it achieves 116fps on the more appropriate Orin Nano 8G, which is far beyond what your typical robot requires.

Drawbacks of cuVSLAM

cuVSLAM is a closed source library (a GEM in NVIDIA terms), shipped with the NVIDIA Isaac SDK, so none of the drawbacks below can be addressed by the users and will be dependent on the efforts of NVIDIA to resolve them, if they choose to. The ROS Isaac wrapper is open source, but little value is found in there since its only purpose is to perform initialization and data copying from/to ROS.

We discovered these pain points that may limit its usefulness for your application:

  • Single Stereo Camera only. The implementation only supports a single stereo camera with IMU. So in case you use multiple stereo cameras or a monocular camera, you can't use cuVSLAM.
  • No global localization guarantee. cuVSLAM does not guarantee that it finds back the correct robot pose after loosing tracking. The documentation specifies that if tracking is lost (because of an obstruction of the camera, or motion blur during rotation), an external algorithm is required to find back the correct pose or the camera. As such, it does not solve the 'kidnapped robot' problem. NVIDIA mitigates this by providing an additional software module, in the isaac_ros_map_localization package, which uses Lidar to estimate the initial position. So such a robot would also depend on the presence of a Lidar sensor.
  • No ability to fuse other sensors. cuVSLAM does not allow to update its Kalman Filter with other measurements, for example from GNSS systems, wheel odometry or Lidar. In many real-world robotic systems, VSLAM is only a part of the sensor setup and never the single source of measurement.
  • No publications. Since there is no publication explaining how the algorithms work within this library, one can only discover the limitations from trial and error.

Key Features of cuVSLAM VIO

The cuVSLAM VIO algorithm detects 2D features on the input images, and creates from this information landmarks. Landmarks refer to patches of pixels in the source images with coordinates of the position of the camera from where that landmark is visible. The difference between Landmarks in subsequent camera images combined with the IMU processing will (presumably using an Extended Kalman Filter) generate a Pose estimation.

cuVSLAM VIO will work with planar and non-planar camera motions and exposes an option to toggle between both set ups.

Schematic of the cuVSLAM odometry thread, optimisation thread and mapping thread functions.

Map building and Loop Closure Implementation

The mapping thread will collect these pose generated from the VIO algorithms and store them in a PoseGraph, which is a graph that tracks the poses of the camera from where these landmarks are viewed. These landmarks are stored in a data structure that doesn't increase its size in the case where the same landmark is visited more than once. This collection of landmarks and poses is typically referred to as the Map.

Now imagine a robot moving around and returning to the same place. Since odometry always has some accumulated drift, we see a deviation in trajectory from the ground truth. During the robot's motion, since cuVSLAM stores all landmarks and continuously verifies if each landmark has been seen before, it can detect if the robot returned to this same place, ie the original position.

At this moment, a connection is added to the PoseGraph which makes a loop from the free trail of the poses to the original position; this event is termed 'loop closure'. Immediately after that, cuVSLAM performs a graph optimization that leads to the correction of the current odometric pose and all previous poses in the graph.

Block diagram of the datastructures and algorithms of the cuvslam algorithm

No more documentation is available about this process and this resembles a classical implementation of loop closure, albeit implemented on a GPU.

Handling Changing Terrain and Poor Visual Conditions

The procedure for adding landmarks is designed such that if we do not see a landmark in the place where it was expected, then such landmarks are marked for eventual deletion. This allows you to use cuVSLAM over a changing terrain.

Along with visual data, cuVSLAM can use Inertial Measurement Unit (IMU) measurements. It automatically switches to IMU when VO is unable to estimate a pose – for example, when there is dark lighting or long solid surfaces in front of a camera. This can however only take less than 1 second, upon which pose tracking is lost.

Saving and Loading Maps

Naturally, we would like to save the stored landmarks and pose graph in a file, to be used later or by another robot. NVIDIA has implemented a ROS 2 action to save the map to the disk called SaveMap. Once the map has been saved to the disk, it can be used later to localize the robot. To load the map into the memory, there is a ROS 2 action called LoadMapAndLocalize. It requires a map file path and a prior pose, which is an initial guess of where the robot is in the map. Given the prior pose and current set of camera frames, cuVSLAM tries to find the pose of the landmarks in the requested map that matches the current set. If the localization is successful, cuVSLAM will load the map in the memory. Otherwise, it will continue building a new map.

Supported Stereo Cameras

Intel, ZED and Leopard Imaging 3D cameras

As the last part of this review, we made an overview of the officially supported stereo cameras with cuVSLAM. We found these references to work on both the NVIDIA platforms and plain Intel setups:

  • Leopard Imaging Hawk 3D Depth Camera. A price competitive 15cm baseline stereo camera running at 60Hz with integrated IMU.
  • Stereolabs ZED 2 and ZED X, both have a 12cm stereo baseline, with the ZED 2 running at 30Hz and the ZED X running at 60Hz in full HD resolution. cuVSLAM requires the ZED SDK to be installed and working properly since it relies on the ZED’s undistortion algorithms.
  • Intel Realsense Family. A popular stereo camera with a 5cm baseline, but with the limitation that the camera’s 3D Depth Map function can’t work reliably since the stereo pair is used for the visual odometry as well and therefore, the Depth IR projector needs to be turned off.

Next steps?

Hopefully you became a bit wiser upon the ins and outs of the cuVSLAM library. Intermodalics offers professional support on this and other VSLAM libraries, don’t hesitate to reach out !

Get a VSLAM capability

Discover Our Expertise

Visual SLAM and Cartographer

We are slam experts, developing, contributing and integrating proprietary and Open Source libraries (like Cartographer, ORBSlam,…) for Visual SLAM, laser-based SLAM, 2D and 3D based SLAM.

Read Visual SLAM and Cartographer

NVIDIA Jetson and Platform Services

As a Preferred Solution Provider, we integrate the state of the art software libraries on the NVIDIA Jetson Platform.

Read NVIDIA Jetson and Platform Services

Visual-Inertial Odometry (VIO)

Our VIO outperforms what is available in common libraries and hardware implementations. We implement monocular, stereo and multi-camera VIO solutions

Read Visual-Inertial Odometry (VIO)