MCPNew: now works with Claude & AI assistants
Xdof

Xdof

Member of Technical Staff, Vision / Language

Company

Xdof

Role

Member of Technical Staff, Vision / Language

Job type

Full-time

Found on Mokaru

1 week ago

Share this job

Salary

Not disclosed by employer

Job description

ABOUT XDOF

Frontier labs are racing to build general-purpose robots, and the bottleneck isn't compute. It's data. At XDOF, we're building the foundation behind the foundation models: the data collection systems, annotation pipelines, exabyte-scale data infrastructure, and software toolchain that enable our partners to push the field forward.

We're hiring a Research Engineer / Scientist to help lead technical efforts at the intersection of vision-language models and robot learning. You will build systems that turn raw egocentric and teleoperation video into high-signal training data for VLA models, and increasingly, contribute to the models themselves.

Beyond pipelines, you will drive research into what makes robot data useful: discovering new metadata (contact events, affordance labels, implicit reward signals, dynamics priors from video) that unlock capabilities current approaches miss. You'll explore how structured annotations can improve cross-embodiment transfer, automatic curriculum generation, and world models that predict what actually matters for manipulation. The data layer isn't downstream of the research. It is the research.

WHAT YOU'LL DO

  • Design and implement vision-language pipelines for egocentric and teleoperation video: structured captioning, temporal grounding, action-conditioned scene understanding, and semantic annotation at scale
  • Develop and evaluate representations that bridge visual perception, language, and low-level robot action — spanning VLAs, video prediction, and world models
  • Build and improve data curation systems that assess quality, diversity, and coverage of large-scale robot demonstration datasets
  • Work hands-on with bimanual and high-DoF manipulation data, including real teleoperation footage and sim-generated rollouts
  • Collaborate directly with partner labs to define data requirements and close the loop between data quality and downstream policy performance
  • Stay current on the research frontier (VLAs, video foundation models, flow matching, DiT architectures, egocentric pretraining) and translate insights into production systems

REQUIRED

  • MS or PhD in Computer Science, Robotics, Machine Learning, or a related field from a top-tier program
  • 3–7 years of research or applied research experience (industry or academic) in one or more of: vision-language models, video understanding, robot learning, or generative modeling
  • Deep fluency in PyTorch; working knowledge of large-scale training infrastructure (distributed training, mixed precision, large batch workflows)
  • Published work or demonstrable impact in VLMs/VLAs, video representation learning, imitation learning, or a closely related area
  • Strong engineering fundamentals — you can design clean systems, not just run experiments

BENEFITS

  • Competitive compensation and equity
  • Comprehensive health and wellness benefits
  • Flexible work arrangements
  • Collaborative and fast-paced work environment
  • Opportunity to shape the future of robotics and AI alongside an ambitious, values-driven team

Level: Mid Level to Senior Research Scientist (L4–L5 equivalent) Location: San Mateo

Note: Junior candidates will still be considered

If you’re excited to help build the infrastructure powering tomorrow’s intelligent machines, we’d love to hear from you!

Resume ExampleCover Letter Example

Explore more