Changsin Lee l Tech Evangelist l Testworks
CVPR (Computer Vision and Pattern Recognition) is the premier annual conference for computer vision and pattern recognition. Due to the ongoing pandemic, the 2021 conference was moved to virtual like last year. Here is my quick summary of the CVPR 2021 based on the publicly available information.
How to search
Here are links you can bookmark for quick access:
- Conference main page: http://cvpr2021.thecvf.com/
- Open Access for papers: To quickly search for papers, you can directly go here: https://openaccess.thecvf.com/CVPR2021
- Workshops: https://openaccess.thecvf.com/CVPR2021_workshops
- Tutorials: http://cvpr2021.thecvf.com/program
Other than the Open Access pages, the 2021 conference provided two new ways to search papers:
1. Cluster graph: Similar to word vector cluster graphs, conference papers are arranged in an interactive cluster graph by their similarity. By hovering over a dot, the paper’s title and authors are shown. If you drag a rectangle over an area, the collection of papers within that region is listed.
2. Tableau visualization: The second search method is to use the Tableau visualization tool which groups papers by areas. The result shows that 3D Computer Vision was the most popular topic, for instance.
Big Topics
1. 3D Modeling: From the number of paper topics, 3D modeling was the biggest research topic which reflects the ongoing interests in autonomous driving. While the popular trend is to construct 3D scenes from multimodal data: i.e., vision, LiDAR, and/or Radar data, an exception is Tesla’s vision-only approach which Andrej Karpathy summarized in the keynote speech for the Autonomous Driving workshop. He argued that using LiDAR was not only expensive but also not scalable in real driving conditions. Instead, Tesla is training on a large set of vision-only data that are large (millions of videos), clean (labeled data for depth, velocity, and acceleration), and diverse (a lot of edge cases). The secret source for building such a large dataset is auto labeling. A typical approach for building a 3D prediction model is to rely on LiDAR data as the ground truth. In contrast, Tesla used three alternative methods: an offline tracker, extra sensors, and human-in-the-loop. By predicting offline, they were able to utilize all kinds of expensive and heavy neural networks together with extra sensor data (e.g., Radar) and use hindsight as well as a human annotation to build massive labeled data. Based on the successful results from the vision-only approach, Andrej made a bold claim: “Vision alone is perfectly capable of depth sensing. It is hard and it requires the fleet, but at least it is barking up the right tree.” We will see whether his claim can be cashed in a few year’s time.
2. Self-supervised learning: Deep learning has an insatiable desire for bigger data but getting labeled data is expensive. The natural evolution is to find better methods to teach the neural network and thus self-supervision is the next focus area. This can be seen by having two tutorials exclusively devoted to self-supervision: self-supervised learning tutorial and Leave those nets alone: advances in self-supervised learning. Furthermore, in Towards a General Solution for Robotics, Pieter Abbeel outlined a Reinforcement Learning approach to build a general robotics AI that can learn from a large unsupervised representation learning with just a little human involvement through demonstrations and labeling.
3. More focus on data quality: Tesla’s vision-only approach highlights a new data-centric approach to AI. Instead of focusing on creating a better model architecture or finetune hyperparameters, most of their efforts were spent on build a good quality dataset. Several workshops were dedicated to dataset issues:
- Learning from Limited or Imperfect Data
- The Second Workshop on Fair, Data-efficient, and Trusted Computer Vision
- Future of Computer Vision Datasets
- Data-Efficient Learning in An Imperfect World
- New frontiers in data-driven autonomous driving.
There seems to be a general consensus among Deep Learning researchers that data is the key to AI development as Andrew Ng recently argued in his recent talk.
4. Security and ethics: Shift to the data-centric approach to machine learning is spurred not only by the performance metrics but more importantly by the increasing need to solve security and ethical issues when AI systems are operating in the wild. You can see from the list of related topics how important the topics on adversarial attacks and biases have become:
- Practical Adversarial Robustness in Deep Learning: Problems and Solutions
- Adversarial Machine Learning in Computer Vision
- Workshop on Adversarial Machine Learning in Real-World Computer Vision Systems and Online Challenges
- Tutorial on Fairness Accountability Transparency and Ethics in Computer Vision
- Responsible Computer Vision
- Ethical Considerations in Creative applications of Computer Vision
- Beyond Fairness: Towards a Just, Equitable, and Accountable Computer Vision
Related to security and ethical issues of AI is the explainability of deep learning models and thus the need for a tutorial on Interpretable Machine Learning for Computer Vision.
Conclusion
The general trend is to move toward a data-centric approach to AI and consider a wide variety of security and ethical issues of AI systems. The fact that the conference became entirely virtual turned out to be a blessing in disguise because it allowed more people to access the data. It is impossible to summarize a huge conference like CVPR 2021, but I hope the current article gave you a big picture of the conference and provided you some pointers to find more information about your interested topics.