Large Language Model Instructed Humanoid Robot

  • Project Year: 2025-26
  • Departments Represented: MEng
  • Industry/Track: Robotics, Aerospace, or Automotive Advancements

This project presents MoFL, a unified pipeline that enables humanoid robots to learn tasks directly from natural language instructions without real-world demonstrations. By leveraging generative video models such as Veo 3.1 and Seedance 2.0, the system generates synthetic “imagined” demonstrations that serve as scalable training data. These videos are converted into executable robot behavior through 3D scene reconstruction, motion retargeting, and policy learning, enabling end-to-end text-to-motion generation. The system produces physically plausible whole-body control in simulation across a range of tasks and verified through real-world deployment. This work highlights the potential of generative models as scalable data sources and planning priors for robotics.

  • Advisor(s): Koushil Sreenath
  • Team: Yiren Rong[ME], David Chen[EECS], Matei Dardea[EECS]