How to convert ROS bag into JSON data for annotation

How to convert ROS bag into JSON data for 3D annotation

Annotation data is often collected by ROS and stored as ROS bags. In order to use’s system for annotation, the data must be first converted into the JSON format:

This JSON file specifies the exact data for annotation. It includes point clouds, images, timestamps, intrinsic and extrinsic calibrations, as well as localization information.

Usually, each user writes a script to convert their bag data into the JSON format. We cannot use a common script for conversion for several reasons:

  1. Each ROS bag may have many topics. The user needs to specify the exact topics for annotation.

  2. Some information such as intrinsic and extrinsic calibrations may not be in the ROS bags.

  3. If a special camera model is used, the user may wish to only send the rectified images for annotation.

  4. The JSON format links the point cloud and images shown to the annotators. It does not sync using timestamps like RVIZ.

  5. If accurate localization is available, the point cloud is transformed into a fixed world coordinate.

  6. Advanced processing such as LiDAR ego-motion compensation is typically performed in the script.

Users not familiar with the JSON format may find the conversion script difficult to develop. At deepen, we have worked with many clients on their data conversion scripts. Although each script is different, we have found many common themes in these scripts. If you are developing a new script, this tutorial will help you to get started and get your data ready for annotation. ROS has good Python support. Most conversion scripts are written in Python, so this tutorial also assumes that. We will walk you through the various steps of the conversion script. We will try to describe the simplest processing. There are many advanced techniques which can improve the data, but we will not cover them here.

Reading the Bag file(s)

The first step is very obvious. You need to specify the path of the files and ROS topics to annotate. The topics should include point clouds and images. Images are very important for annotation speed and accuracy.

Synchronizing the point clouds and images

In the 3D annotation tool, the annotator is shown a point cloud and one image from each camera. Thus, the JSON format specifies the connection between the point cloud and images. To calculate this, our script needs to explicitly synchronize the data. Often, the synchronization is done by ROS timestamps. Let us assume that there is only one LiDAR. Thus, there is only a single point cloud sequence. In this stage, we make a single pass through the point cloud data and collect its timestamps. This is our primary timestamp sequence.

We then make a pass through each image topic that we can show to the annotators. As we go through the images, we find the closest timestamp to each LiDAR timestamp. This will be the image attached to each point cloud. Thus, there is one image from each camera for each point cloud. The image sequence is synchronized to the LiDAR sequence.

Temporal Calibration (Advanced Topic)

Technically, each timestamp occurs when the LiDAR finishes a spin or a camera closes its shutter and ROS records its data. The sensors, network, and ROS itself all add latency to this process. Thus, the timestamps usually occur a few milliseconds after the actual events. This latency is typically different for each sensor and introduces inconsistency in timestamps. Temporal calibration is the method to adjust the timestamps, so all sensors have consistent timestamps. We will not cover it here, and you may skip it for your initial conversion script.


In the case when we have multiple LiDARs that we need to annotate, we can pick one LiDAR as the primary one and synchronize all other LiDARs to it. Note that there is the “d” field for each point in deepen’s JSON format, to which we can assign the LiDAR ID. If you use the “d” field, our annotation UI allows you to distinguish between point clouds from different LiDARs.


The JSON format specifies all points in a single world coordinate because it would make annotation much easier and more accurate: All static objects would be in a fixed location. All dynamic objects would have their simple motions, without complications from the motion of the ego vehicle itself. In order to achieve this, we would need accurate localization data. The best localization usually comes from an expensive INS/GNSS. Otherwise, LiDAR SLAM combined with other sensors such as IMU and odometers can also give accurate localization in most situations.

If an accurate localization is available as a ROS topic, we just need to find the LiDAR pose with a timestamp closest to that of the LiDAR point cloud. We can use the pose to transform the point cloud into the world coordinate. Note that accurate LiDAR poses require LiDAR to INS calibration, which we will not cover here.

If localization is unavailable, we suggest that you try one of the LiDAR SLAM algorithms. If you skip this step and use the LiDAR point cloud as-is, it is still possible to annotate the data, but the cost and accuracy would both suffer.

Ego Motion Compensation (Advanced topic)

Most LiDARs have a slow “rolling shutter” where each scan or spin can take tens or hundreds of milliseconds. During this time interval, the LiDAR itself may be going through a complicated motion. If we treat the LiDAR as stationary for the entire time interval, the point cloud would be inaccurate. Ego motion compensation is the technique to solve this problem, but we will not cover it here. You can ignore this issue unless your vehicle was moving at a high speed such as on a highway.

Intrinsic and Extrinsic Calibrations

Accurate intrinsic calibration is required for each camera. We support the common plumb-bob (Brown & Conrady) and equidistant fisheye (Kannaly & Brandt) camera models. If you choose a model we don’t currently support, you can submit rectified images instead.

Extrinsic calibration is for specifying the camera pose for each image. If localization and ROS tf transforms are available, you just need to obtain the camera pose at the timestamp of the image. Note that we are using the camera coordinate system in OpenCV. It is identical to the ROS optical coordinate system, not the standard ROS coordinate system.

If the tf transforms cannot give you correct camera poses, but you have the extrinsic LiDAR-camera calibration, you can apply the extrinsic calibration to obtain the camera pose from the LiDAR pose. It is just a simple matrix multiplication. Since the point cloud and image usually have different timestamps, it would be more accurate to interpolate the LiDAR or Camera poses, but we will skip it for this tutorial.


After obtaining the information above, we just need to make a pass through the point clouds and images to output the JSON and image files. Note that the JSON format supports BASE64 encoding, which can make your JSON files much smaller and the script run much faster.


Debugging your first conversion script often takes a long time. We have some tips to find the most common bugs.

The first tip to debug is to output a small number of frames. Your script will finish quickly. You can upload this subset into deepen’s tool quickly and visualize the data.


To verify your localization accuracy, the easiest way is to load a moving point cloud sequence into deepen’s annotation tool. You can use the “fuse” option to accumulate multiple point clouds together. If the static objects are sharp in the fused point cloud, you are likely to have the correct localization. If not, there are many common mistakes:

  1. Low quality or misconfigured INS/GNSS

  2. Wrong usage of INS output

  3. Wrong coordinate frame is used

  4. Bad output from LiDAR SLAM algorithm


To validate your camera calibrations, you can load a sequence into the annotation tool. Add a 3D box to an object. If the projection of the box on the image looks correct, you probably have the correct calibrations. Otherwise, there are many common mistakes:

  1. Giving rectified images and also non-zero distortion coefficients

  2. Using the ROS coordinate system instead of the optical coordinate system

  3. Scaled the images but did not change the intrinsic parameters

  4. Bad LiDAR-camera calibration. You can use deepen’s calibration tool to redo or correct the calibration.

  5. The pose of the camera may be off due to localization error.


In order to validate the synchronization of your data, you can load a sequence into the annotation tool. Add a box to a moving object. If the projections on all camera images look correct, you should have a proper synchronization between point cloud and camera. For a high-precision check, the object should have a high angular velocity relative to the sensors. Therefore, you can use an object moving quickly from left to right or use a sequence where the ego vehicle is turning rapidly. If you found a synchronization error, there are several common mistakes:

  1. For big errors, there is likely to be some bugs in the synchronization logic in the script.

  2. For small errors, temporal calibration may be needed.

  3. For very small errors, we may need to consider the exact time the LiDAR scans the object which is different from the timestamp we assigned to the entire point cloud. This is an advanced topic.

Visualization code

We have developed an offline visualization script to visualize the JSON files. Please find the tool and documentation for it here.

Sample Code

We will release a sample conversion script soon.

Please contact us if you run into problems with your conversion script. We will gladly help you debug your script.

Last updated