Informations générales
Entité de rattachement
Le CEA est un acteur majeur de la recherche, au service des citoyens, de l'économie et de l'Etat.
Il apporte des solutions concrètes à leurs besoins dans quatre domaines principaux : transition énergétique, transition numérique, technologies pour la médecine du futur, défense et sécurité sur un socle de recherche fondamentale. Le CEA s'engage depuis plus de 75 ans au service de la souveraineté scientifique, technologique et industrielle de la France et de l'Europe pour un présent et un avenir mieux maîtrisés et plus sûrs.
Implanté au cœur des territoires équipés de très grandes infrastructures de recherche, le CEA dispose d'un large éventail de partenaires académiques et industriels en France, en Europe et à l'international.
Les 20 000 collaboratrices et collaborateurs du CEA partagent trois valeurs fondamentales :
• La conscience des responsabilités
• La coopération
• La curiosité
Référence
2024-33196
Description de l'unité
Based in Saclay (Essonne), the LIST is one of the two institutes of CEA Tech, the technological research division of the CEA. Dedicated to intelligent digital systems, its mission is to carry out technological developments of excellence on behalf of industrial partners in order to create value.
Within the LIST, the Laboratory of Vision and Learning for Scene Analysis (LVA) conducts research in the field of computer vision and artificial intelligence for the perception of intelligent and autonomous systems. The laboratory's research themes include visual recognition, behavior and activity analysis, large-scale automatic annotation, and perception and decision models. These technologies are applied in major sectors such as security, mobility, advanced manufacturing, healthcare, and sports.
Description du poste
Domaine
Mathématiques, information scientifique, logiciel
Contrat
Stage
Intitulé de l'offre
Label Efficient 3D Detection with Foundation Models H/F
Sujet de stage
In this internship, you will contribute to breakthrough research in areas critical to autonomous driving, robotics, and augmented reality. We are addressing the challenge of reducing reliance on costly 3D annotations by exploring cutting-edge techniques.
As part of our team, you will:
Leverage vision-language models (VLMs) to enhance 3D detection performance.
Work on innovative pseudo-labeling techniques to improve model training with minimal labeled data.
Use 2D and 3D feature integration to improve scene understanding.
Gain hands-on experience with deep learning frameworks and 3D vision algorithms.
This internship is a fantastic opportunity to dive into the world of AI research and contribute to real-world applications. You will be working on high-impact projects with the potential for publication in top conferences. If you're passionate about machine learning, computer vision, and advanced AI techniques, we encourage you to apply!
Durée du contrat (en mois)
6
Description de l'offre
3D object detection is a critical component of many applications such as autonomous driving, robotics, and augmented reality, where having a precise understanding of the 3D environment is crucial. In the context of 3D object detection, a key challenge lies in the high cost of annotating 3D bounding boxes, making it difficult to scale supervised learning methods to new applications.
To address this, various learning paradigms such as semi-supervised[1][2], weakly supervised[3], and unsupervised domain adaptation have been proposed to reduce the need for large amounts of annotated data while maintaining or improving performance. By leveraging minimal labeled data or even unannotated data, these approaches help reduce the reliance on costly 3D box annotations.
Most state-of-the-art methods rely on a teacher-student architecture. A crucial aspect of this approach is pseudo-label filtering, which can be done using two main strategies. One strategy involves untrained heuristics, such as confidence scores produced by detection models, while the other strategy uses uncertainty estimation modules trained on a small set of annotated 3D data. Both of these approaches, however, have limitations. Heuristics can be overly reliant on hyperparameters that may overfit, while uncertainty estimators can prove unreliable.
Recent breakthroughs in 2D vision-language models (VLMs) have inspired research in 3D vision, particularly around the potential of these models for pretraining [4][5].
However, despite the promise of VLMs, there is little exploration of their use in the context of semi-supervised, weakly supervised, or unsupervised domain adaptation for 3D object detection. Therefore, we aim to fill this gap by leveraging the power of foundation models for more robust pseudo-label filtering. This could involve using pixel features from the 2D projections of 3D points to calculate intra-object coherence, as well as neighborhood incoherence scores to ensure that objects are correctly detected and isolated. Additionally, 2D features could be used as a pretext for scene completion tasks, providing finer object contours and estimating occluded parts of detected objects.
[1] Zhao, N., et al. (2020). Sess: Self-ensembling semi-supervised 3d object detection. CVPR.
[2] Xu, H., et al. (2021, September). Semi-supervised 3d object detection via adaptive pseudo-labeling. ICIP.
[3] Yao, B., et al. (2024). Uncertainty-guided Contrastive Learning for Weakly Supervised Point Cloud Segmentation. IEEE Transactions on Geoscience and Remote Sensing.
[4] Chen, Zhimin, et al. "Bridging the domain gap: Self-supervised 3d scene understanding with foundation models." NeurIPS 2024
[5] Sirko-Galouchenko, S., et al. (2024). OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks. CVPR 2024.
Profil du candidat
- Students in their 5th year of studies (M2 or gap year)
Computer vision skills - Machine learning skills (deep learning, perception models, generative AI…)
- Python proficiency in a deep learning framework (especially TensorFlow or PyTorch)
- Scientific research experience will be appreciated
In line with CEA's commitment to integrating people with disabilities, this job is open to all.
Localisation du poste
Site
Saclay
Localisation du poste
France, Ile-de-France, Essonne (91)
Ville
Palaiseau
Critères candidat
Diplôme préparé
Bac+5 - Master 2
Possibilité de poursuite en thèse
Oui