Xdof
Member of Technical Staff, Perception
Salary
Job description
At XDOF, we’re at an inflection point. Frontier labs are racing to build general-purpose robots, and high-quality training data is the bottleneck. We’re building the foundation behind the foundation models – the data collection systems, operational capability, exabyte-scale data warehouse, and software toolchain – to help our partners drive the field forward.
The Perception Algorithm team transforms raw multimodal sensor data into high-quality robot training annotations. You will be deeply involved in the complete loop from data collection to model delivery — sensor calibration, SLAM localization, human pose estimation, perception model training, and embedded deployment. Your work directly determines the quality ceiling of our training data.
Core Responsibilities
Human Pose Estimation
- Design and optimize hand pose estimation pipelines supporting accurate joint angle extraction from teleoperation data collection
- Build full-body pose estimation systems for motion capture and teleoperation action annotation ground truth generation
- Research and apply vision-based pose estimation methods (markerless) to reduce data collection costs
- Fuse pose estimation outputs with robot joint angle data to generate consistent training annotations
Robot Perception & Calibration
- Design and maintain intrinsic/extrinsic calibration pipelines for multi-camera arrays (factory calibration + online recalibration)
- Build visual SLAM / V-SLAM systems supporting real-time localization and scene reconstruction on data collection platforms
- Implement hand-eye calibration between cameras and robot end-effectors
- Develop temporal alignment solutions across multimodal sensors (cameras, IMU, data gloves, force sensors)
Perception Model Training & Deployment
- Train and iterate on perception models including object detection, instance segmentation, and 6DoF pose estimation
- Optimize model inference using TensorRT / CUDA for real-time performance on robot embedded platforms
- Write custom CUDA kernels for low-level acceleration of perception tasks
- Design evaluation metric frameworks for perception models; continuously track the relationship between model performance and data quality
End-to-End Loop from Data Collection to Model Delivery
- Contribute to the design of automated annotation pipelines that convert sensor data into structured training labels
- Build Auto QA modules to filter low-quality data including anomalous frames, failed demonstrations, and sensor dropouts
- Collaborate with ML engineers and data infrastructure teams to ensure perception output formats meet downstream VLA model training requirements
- Establish feedback mechanisms linking perception accuracy to model training outcomes, continuously improving annotation quality
Requirements
Must-Have
- 5+ years of industry experience in robot perception or computer vision
- Strong 3D vision fundamentals: stereo and structured-light camera principles, 3D reconstruction
- Proficiency with SLAM frameworks (ORB-SLAM, VINS-Mono, FastLIO, etc.) or V-SLAM system development experience
- Hands-on engineering experience with human pose estimation: hand joints (MediaPipe, MANO) or full-body pose (OpenPose, SMPLify, etc.)
- Proficient in deep learning training frameworks for perception model training, tuning, and evaluation
- TensorRT deployment experience with real-time inference optimization on embedded platforms (Jetson, Horizon, etc.)
- CUDA programming fundamentals; ability to write or debug custom kernels
- Proficient in C++ and Python with ROS / ROS2 development experience
- Proficient with AI coding agents
Nice to Have
- Engineering experience with 6DoF object pose estimation (FoundPose, FoundationPose, GDR-Net, etc.)
- Familiarity with 3D Gaussian Splatting or NeRF for scene reconstruction or data augmentation
- Experience with robot manipulation or teleoperation systems
- End-to-end development experience with automated annotation pipelines or ground truth generation systems
- Published research in perception, pose estimation, or robotics
What We Offer
- Direct involvement in the most critical technical challenge in embodied intelligence: producing high-quality robot training data
- An environment working alongside top-tier robotics engineers and ML researchers
- Proprietary hardware platforms (humanoid robots, camera arrays, data gloves)
- A fast-paced, high-autonomy 0→1 work environment


