Details about output format for semantic segmentation labels
Labels for 3D semantic segmentation can be downloaded as the combination of a binary (.dpn) file and a supporting metadata (.json) file per dataset.
The json data present in the metadata file is in the below format : { "paint_categories": [ "paint_category_1", "paint_category_2", "paint_category_13, ] }
Here, paint_categories is a list of paint categories configured for the project the dataset belongs to.
The DPN file is a binary file which contains point level information for label category and label id. For a dataset with n number of points, dpn file contains n bytes that represent the label category.
For example, consider a dataset containing 3 files each with 10,000 points, 11,000 points and 9000 points respectively. Assuming that the point with index 50 in second file of the dataset is annotated as below :
label_category : "paint_category_2"
For the above scenario, dpn file will contain 30000 bytes (10,000 + 11,000 + 9000). The annotation information for the point will be present at 10,050th (10,000 + 50) index. The byte value at this index would be 2 which is the 1 based index of "paint_category_2" in the paint categories provided in the metadata. The value 0 for label category is reserved for unpainted points.
To extract the metadata and compression details, you will need to look at the response headers. Below is an example of response header:
< HTTP/2 200 < server: gunicorn/20.0.4 < date: Fri, 11 Dec 2020 15:12:39 GMT < content-type: text/html; charset=utf-8 < paint-metadata: {"format": "pako_compressed", "paint_categories": ["Drivable region", "Uneven terrain", "Hard Vegetation", "Soft vegetation", "Construction object", "Other Traversable Static", "Ghost", "Static Object", "car", "Vehicle _ Truck", "Pedestrian with Object _ Adult", "Pedestrian _ Adult", "Vehicle _ Trailer", "Construction Object _ Construction Cones and Poles", "Ghost _ Sensor Artifact", "Vehicle _ Van", "Vehicle _ Car", "Ego Car", "Ground", "dynamic_buffer"]}
You can get the paint_categories and compression format from the above data. Here the compression format is pako_compressed and you can use pako decompression to get the paint labels for each point.
For the paint categories, your understanding is correct. We reserve 0 for unlabelled points and use the index from the paint_categories field in the paint labels.