Real2Gen: Imitation Learning from a Single Human Demonstration with Generative Foundational Models

Imitation learning is a common paradigm for teaching robots new tasks. However, collecting robot demonstrations through teleoperation or kinesthetic teaching can be tedious and time-consuming, slowing down training data collection for policy learning. On the other hand, while transfer to the robot can be non-trivial, directly demonstrating a task using our human embodiment is much easier, and data is available in abundance. In this work, we propose Real2Gen to train a manipulation policy from a single human demonstration. Real2Gen extracts required information from the demonstration, transfers it to a simulation environment, where a programmable expert agent can demonstrate the task arbitrarily many times, generating an unlimited amount of data to train a flow matching policy. We evaluate Real2Gen on human demonstrations from three different real-world tasks and compare it to a recent baseline. Real2Gen shows an average increase in the success rate of 26.6% and better generalization of the trained policy due to the abundance and diversity of training data.

Code

This work is released under under the GPLv3 license. For any commercial purpose, please contact the authors. A software implementation of this project can soon be found on GitHub.

Publications

If you find our work useful, please consider citing our paper:

Nick Heppert*, Minh Quang Nguyen*, Abhinav Valada
Real2Gen: Imitation Learning from a Single Human Demonstration with Generative Foundational Models
ICRA Workshop on Foundation Models and Neuro-Symbolic AI for Robotics, 2025.

(PDF) (BibTeX)

Authors

Nick Heppert^*

University of Freiburg, Zuse School ELIZA

Minh Quang Nguyen^*

University of Freiburg

Abhinav Valada

University of Freiburg

^*equal contribution

The code implementation of this project was done my Minh during his Master Thesis at the University of Freiburg. Nick provided the initial idea, supervised the Thesis and wrote the paper.

Acknowledgment

This work was funded by the Carl Zeiss Foundation with the ReScaLe project.

Nick Heppert is supported by the Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA) through the DAAD programme Konrad Zuse Schools of Excellence in Artificial Intelligence, sponsored by the Federal Ministry of Education and Research.