MCPNew: now works with Claude & AI assistants
Xdof

Xdof

Member of Technical Staff, Perception

Company

Xdof

Role

Member of Technical Staff, Perception

Job type

Full-time

Found on Mokaru

🔥Recently

Share this job

Salary

Not disclosed by employer

Job description

At XDOF, we’re at an inflection point. Frontier labs are racing to build general-purpose robots, and high-quality training data is the bottleneck. We’re building the foundation behind the foundation models – the data collection systems, operational capability, exabyte-scale data warehouse, and software toolchain – to help our partners drive the field forward.

The Perception Algorithm team transforms raw multimodal sensor data into high-quality robot training annotations. You will be deeply involved in the complete loop from data collection to model delivery — sensor calibration, SLAM localization, human pose estimation, perception model training, and embedded deployment. Your work directly determines the quality ceiling of our training data.

Core Responsibilities

Human Pose Estimation

  • Design and optimize hand pose estimation pipelines supporting accurate joint angle extraction from teleoperation data collection
  • Build full-body pose estimation systems for motion capture and teleoperation action annotation ground truth generation
  • Research and apply vision-based pose estimation methods (markerless) to reduce data collection costs
  • Fuse pose estimation outputs with robot joint angle data to generate consistent training annotations

Robot Perception & Calibration

  • Design and maintain intrinsic/extrinsic calibration pipelines for multi-camera arrays (factory calibration + online recalibration)
  • Build visual SLAM / V-SLAM systems supporting real-time localization and scene reconstruction on data collection platforms
  • Implement hand-eye calibration between cameras and robot end-effectors
  • Develop temporal alignment solutions across multimodal sensors (cameras, IMU, data gloves, force sensors)

Perception Model Training & Deployment

  • Train and iterate on perception models including object detection, instance segmentation, and 6DoF pose estimation
  • Optimize model inference using TensorRT / CUDA for real-time performance on robot embedded platforms
  • Write custom CUDA kernels for low-level acceleration of perception tasks
  • Design evaluation metric frameworks for perception models; continuously track the relationship between model performance and data quality

End-to-End Loop from Data Collection to Model Delivery

  • Contribute to the design of automated annotation pipelines that convert sensor data into structured training labels
  • Build Auto QA modules to filter low-quality data including anomalous frames, failed demonstrations, and sensor dropouts
  • Collaborate with ML engineers and data infrastructure teams to ensure perception output formats meet downstream VLA model training requirements
  • Establish feedback mechanisms linking perception accuracy to model training outcomes, continuously improving annotation quality

Requirements

Must-Have

  • 5+ years of industry experience in robot perception or computer vision
  • Strong 3D vision fundamentals: stereo and structured-light camera principles, 3D reconstruction
  • Proficiency with SLAM frameworks (ORB-SLAM, VINS-Mono, FastLIO, etc.) or V-SLAM system development experience
  • Hands-on engineering experience with human pose estimation: hand joints (MediaPipe, MANO) or full-body pose (OpenPose, SMPLify, etc.)
  • Proficient in deep learning training frameworks for perception model training, tuning, and evaluation
  • TensorRT deployment experience with real-time inference optimization on embedded platforms (Jetson, Horizon, etc.)
  • CUDA programming fundamentals; ability to write or debug custom kernels
  • Proficient in C++ and Python with ROS / ROS2 development experience
  • Proficient with AI coding agents

Nice to Have

  • Engineering experience with 6DoF object pose estimation (FoundPose, FoundationPose, GDR-Net, etc.)
  • Familiarity with 3D Gaussian Splatting or NeRF for scene reconstruction or data augmentation
  • Experience with robot manipulation or teleoperation systems
  • End-to-end development experience with automated annotation pipelines or ground truth generation systems
  • Published research in perception, pose estimation, or robotics

What We Offer

  • Direct involvement in the most critical technical challenge in embodied intelligence: producing high-quality robot training data
  • An environment working alongside top-tier robotics engineers and ML researchers
  • Proprietary hardware platforms (humanoid robots, camera arrays, data gloves)
  • A fast-paced, high-autonomy 0→1 work environment
Resume ExampleCover Letter Example

Explore more