Job description

**Overview:**
We're seeking a talented Machine Learning Systems Engineer to join our team and strengthen the performance and scalability of our distributed training infrastructure. As a key member of our team, you'll work closely with researchers to develop and execute large-scale training runs, and contribute to building tools that make distributed training more efficient and accessible. **Responsibilities:**
• Collaborate with researchers to enable them to develop systems-efficient models and architectures
• Apply the latest techniques to our internal training runs to achieve impressive hardware efficiency for our training runs
• Create tooling to help researchers distribute their training jobs more effectively
• Profile and optimize our training runs
• *Qualifications:**
• Experience with large-scale ML training pipelines and distributed training frameworks
• Strong software engineering skills in Python
• Passion for diving deep into systems implementations and understanding fundamentals to improve their performance and maintainability
• Experience improving resource efficiency across distributed computing environments by leveraging profiling, benchmarking, and implementing system-level optimizations
• *Benefits:**
As a Machine Learning Systems Engineer at Susquehanna, you'll be part of a global quantitative trading firm that combines deep research, cutting-edge technology, and a collaborative culture. You'll play a critical role in shaping the future of AI at Susquehanna, enabling research at scale, accelerating experimentation, and helping unlock new opportunities across the firm

Responsibilities

As a key member of our team, you'll work closely with researchers to develop and execute large-scale training runs, and contribute to building tools that make distributed training more efficient and accessible

Collaborate with researchers to enable them to develop systems-efficient models and architectures

Apply the latest techniques to our internal training runs to achieve impressive hardware efficiency for our training runs

Create tooling to help researchers distribute their training jobs more effectively

Profile and optimize our training runs

Passion for diving deep into systems implementations and understanding fundamentals to improve their performance and maintainability

Experience improving resource efficiency across distributed computing environments by leveraging profiling, benchmarking, and implementing system-level optimizations

Machine Learning Systems Engineer: Distributed Training

Job description

Responsibilities

Qualifications

Track your job applications with Mokaru

Similar jobs

Ready to land your next role?