Rhoda-ai
Research Scientist / Engineer - Video Generation Modeling
Company
Role
Research Scientist / Engineer - Video Generation Modeling
Job type
Full-time
Posted
2 days ago
Salary
Job description
At Rhoda AI, we’re building the next generation of generalist intelligent robots. We own the full robotics stack from high-performance hardware and robot systems to the infrastructure and state-of-the-art foundation world models that control our robots. Our robots are designed to be generalists capable of operating in complex, real-world environments and handling long-tail edge cases, made possible by our cutting edge research and end-to-end system design. We've raised over $400M and are investing aggressively in model research, infrastructure, hardware development, and manufacturing scale-up to make generalist robotics a reality.
We're looking for Research Scientists and Research Engineers to push the frontier of large-scale pre-training for our video action model. Our approach formulates robot control as video prediction — we pre-train causal video generation models on web-scale video data, then adapt them to predict robot actions from real-world demonstrations. You'll work on the core architectures, training objectives, and scaling strategies that determine how well our models learn from internet-scale video. We hire across levels — from senior to staff — and welcome both research-track and engineering-track candidates.
What You'll Do
- Design and train large-scale causal video generation models on web-scale video data
- Develop and validate training objectives, model architectures, and data mixtures for video prediction at scale
- Research scaling laws and data efficiency for web-scale video pretraining
- Investigate what properties of web video transfer most effectively to robotic control and action prediction
- Build systematic evaluations to measure video generation quality, long-horizon prediction fidelity, and downstream robot task performance
- Run rigorous ablations and benchmarking to understand what drives model quality at scale
- Collaborate closely with data & evaluation, post-training, and training systems teams to translate research ideas into working systems
- Publish and present work at top-tier ML and robotics venues (especially valued for RS track)
What We're Looking For
- Strong background in large-scale generative modeling — either video generation (autoregressive video models, diffusion transformers, causal video architectures) or language model pretraining (LLMs, autoregressive transformers at scale)
- Hands-on experience training large generative models from scratch at scale
- Deep understanding of autoregressive modeling, causal architectures, and scaling behavior
- Fluency with modern ML frameworks (PyTorch required; JAX a plus)
- Ability to design experiments, interpret results, and iterate quickly
- Strong research taste: ability to identify high-leverage questions and cut through noise
- Comfort operating in a fast-moving, ambiguous startup environment
- Staff-level candidates are expected to define technical direction and drive research strategy independently; senior/MTS candidates execute complex projects with strong fundamentals and growing scope
Nice to Have (But Not Required)
- PhD in ML, CS, Robotics, or a related field — or equivalent research/industry experience
- Strong publication record at NeurIPS, ICML, ICLR, CVPR, CoRL, etc. (especially valued for RS track)
- Prior work specifically on video generation models (autoregressive video, diffusion transformers, world models, or causal video architectures)
- Experience with large-scale autoregressive language model pretraining and scaling
- Familiarity with web-scale video datasets and video data curation pipelines
- Prior work connecting video generation to control, action prediction, or robotic learning
- Familiarity with distributed training and multi-node infrastructure
Why This Role
- Work on a fundamentally different approach to robot learning — web-scale video pretraining rather than robot-data-only VLA models
- Your models give our robots the ability to understand and predict the visual world from internet-scale supervision
- Direct collaboration with data, post-training, and deployment teams with no silos
- High ownership and fast iteration in a small, elite team


