Point Cloud, Data for 3D AI (1)

Summary

The revival of deep learning technology yielded many research and developmental results in various artificial intelligence areas. One of the fields that has achieved the most remarkable growth in artificial intelligence (AI) to date is computer vision. Image data are understood and analyzed most intuitively using visual information, so it is the most used data type in the development of artificial intelligence models. Originally, interpreting visual information was a very difficult problem for machine learning, but deep learning technology has solved it to some extent. Some common Computer Vision tasks are Classification to classify images, Object Detection to detect objects, and Semantic Segmentation to find object areas. Currently, technologies for analyzing and extracting useful information from image data have made remarkable progress and are being used in various industries.

Then, what happens when the information to be interpreted is expanded to three dimensions? Image data have useful visual information, but since they are two-dimensional, it is difficult to obtain three-dimensional spatial information. For an AI to learn three-dimensional information, data that represent not only visual but three-dimensional spatial information is required. In this article, I would like to introduce ‘point cloud data’, which is one of the three-dimensional data types, and introduce some AI systems using them.

Point Cloud Data

[Figure 1] LiDAR sensor (left) and RGB-D camera (right)
(Source: Left – Velodyne Lidar website , Right – Intel Realsense website )

There aren’t many sensors available on the market that can visually represent 3D spatial information. All sensors representing 3D spatial information use the same method: namely, they represent spatial relationships of objects by collecting numerous points that include location information. A set of points with spatial information is called a point cloud and it is the most basic form of 3D data. All 3D data can be converted and represented in point cloud through post-processing.

[Figure 2] Visualization of Point Cloud Data

Characteristics of Point Cloud Data and Limitations of 3D Artificial Intelligence

Data that visually represents 3D spatial information are point cloud data, on which main research on 3D AI has been conducted. As with 2D image data, tasks such as Classification, Object Detection, and Semantic Segmentation can be done with 3D data based on Point Cloud data.

[Figure 3] Point Cloud-based 3D Object Detection

However, until just a few years ago, there were very few deep learning models that used point cloud data because point cloud data are difficult to use for training such models. To understand the recent deep learning models that deal with point cloud data (described in a later section), we must first know the properties of point cloud data.

Unsorted and unstructured data

[Figure 4] Visualization image of Point Cloud data (left) and actual storage format (right)

Point cloud data are not standardized like 2D image data. In the case of 2D image data, information is stored in a fixed grid structure, but in point cloud data, information is recorded in numerous points in 3D space without any order. Such data pose difficult learning challenges for deep learning models, specially to grasp the geometric characteristics of data, such as the shape of an object and interactions between points.

To solve this problem, a pre-processing method was developed to turn point cloud data into Voxel formatted data. However, even the Voxel data format has its own limitations.

Point Cloud data with sparse nature

[Figure 5] Point Cloud data with much more empty space compared to 2D points

While 2D image data has a dense characteristic in the sense that all pixel values ​​exist in a given grid, the point cloud data has a very sparse structure. In other words, there is a lot of empty area in the 3D space of the given data compared to the 2D points. Relative to their size, there is little meaningful information available for these data. The sparseness of data remains the same even if the point cloud data are converted into a standardized data format through pre-processing.

When an AI model is trained with such data, it is difficult to obtain meaningful information, and this increases training complexity. For this reason, early 3D AI models showed limitations and did not show good performance for a long time.

Advances in 3D artificial intelligence handling point cloud data

Since then, research on 3D AI models using point cloud data has focused on overcoming these limitations of point cloud data. Rather than simply representing all the point cloud data, researchers tried to extract only meaningful information from the point cloud data. PointNet [1] was an example of this approach and showed good results.

Representative deep learning models that have performed well based on point cloud data can be introduced as follows.

PointNet

[Figure 6] Structure of PointNet announced by Stanford University

PointNet was trained on raw point cloud data without any transformation. Even though point cloud data are without any order, by introducing Max Pooling and Spatial Transformer Network to deep learning networks, it succeeded in learning the geometric properties. The initial PointNet model could do Classification and Semantic Segmentation tasks using point cloud data.

VoxelNet

[Figure 7] Structure of VoxelNet presented by Apple
(Source: Yin Zhou, et al. Excerpt from the paper “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection”)

VoxelNet [2] is a 3D object detection model that extracts Voxel features from point cloud data and then analyzes them to detect objects. The 3D space data are divided into Voxel units and the points in each Voxel unit are passed to a deep learning network called Voxel Feature Encoding Layer which outputs Voxel features. A unique characteristic of VoxelNet is that rather than simply preprocessing point cloud data in Voxel format, it creates a Voxel unit feature map through a deep learning network. It has been proven through experiments that the feature map data obtained by this method is easier to interpret than the existing method.

SECOND

[Figure 8] Sparse Conv structure that is easy to interpret sparse data
(Source: Yan Yan, et al. Excerpted from “SECOND: Sparsely Embedded Convolutional Detection” paper)

Sparsely Embedded Convolutional Detection (SECOND) [3] has the same basic structure as VoxelNet, but it is characterized by using a layer called Sparse Conv Layer instead of the existing CNN for the Convolutional Middle Layer in VoxelNet. This layer is also called SPCONV. SPCONV significantly lowers the amount of computation required by the existing CNN by using setting rules for repeated patterns. Therefore, the model has the advantage of interpreting very sparse data such as point cloud data like VoxelNet, but at the same time, the processing speed is very fast.

PointPillars

PointPillars [4] is a 3D object detection model that uses a new type of Point Cloud Encoder called Pillar to obtain a grid-type feature map from point cloud data. It then analyzes the feature map to detect objects. If VoxelNet obtains a 3D Voxel unit feature map, PointPillar projects the point cloud data at a specific point in time and then obtains a 2D grid unit feature map. The feature map obtained in this way can be analyzed through 2D CNN in the same way as image data, so the processing speed is quite fast.

 

As some models that overcome the limitations of point cloud data were developed, research on artificial intelligence models based on point cloud data became much more active than before. Currently, based on the models introduced above, research is being actively conducted in two major ways. One is a ‘point-based method’ that develops AI models that use point cloud data as they are, like PointNet. The other method is ‘the grid-based method’ that creates a feature map in grid units from point cloud data like VoxelNet, SECOND, and PointPillars. Researchers are trying both methods, just like they did on 2D image data in previous years. Research results are already being applied in some industries such as autonomous driving, robot driving, and HD map production.

Conclusion

Compared to the rapid development of artificial intelligence models that are developed on two-dimensional image data, the pace of the development that deals with three-dimensional point cloud data is slow. I believe that the reason is because the barrier to entry was very high due to the nature of point cloud data. Fortunately, these problems have been overcome, and the development speed of 3D AI is much faster than before. In the future, artificial intelligence datasets are expected to be expanded to 3D and proceed very quickly from there.

Currently, Testworks is analyzing and researching point cloud data and several 3D AI models that use such data. In the next series, we will introduce an automated and optimized method of collecting and processing point could data that we have developed at Testworks. We hope that you will gain a clear understanding about 3D AI technology and how to work with point cloud datasets.


References

[1] Charles R. Qi, et al. “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation”. Stanford University. (2017)

[2] Yin Zhou, et al. “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection”. Apple. (2017)

[3] Yan Yan, et al. “SECOND: Sparsely Embedded Convolutional Detection”. (2018)

[4] Alex H. Lang, et al. “PointPillars: Fast Encoders for Object Detection from Point Clouds”. (2019)


Yesung Park

Researcher, AI model development team

Bachelor of Mechanical System Design Engineering, Hongik University

Master of Engineering, Department of Intelligent Robotics, Hanyang University Graduate School

He conducted research on AI robotics at the Intelligent Robot Research Institute of Hanyang University. Since then, he has been interested in computer vision and deep learning and is currently working in the Testworks AI Model Development Team. His major research areas are GAN and 3D AI.