Titre du sujet de recherche
Real-Time Human Action Recognition for Mixed Reality Interfaces
Sujet de recherche
I am working on designing an online first-person (egocentric) human action recognition system in mixed reality (HAR). This research focuses on recognizing actions through hand gestures seen via head-mounted devices and the classification of manipulated objects (grasp recognition) in real time. Achieving high performance typically requires pre-training video transformers on extremely large-scale datasets. In this direction, our work focuses on leveraging valuable data modalities to reduce dependence on labeled data. In large-scale video-language pre-training (VLR), a well pre-trained model can be fine-tuned for many downstream tasks using only a few samples this principle inspires our approach. We aim to develop a data-efficient estimation and detection model for self-supervised video pre-training (SSVP), capable of zero-shot learning. Our method is based on multimodal fusion and knowledge transfer across modalities to improve accuracy and robustness. Our final goal is to introduce a competitive and practical framework with minimal latency and resource usage, suitable for real-world applications and deployment in virtual and augmented reality environments.
