Sig

Sig

Machine Learning Systems Engineer: Distributed Training

Philadelphia, Pennsylvania, USFull-timeTodayvia Zippia

Job description

**Overview:**
We're seeking a talented Machine Learning Systems Engineer to join our team and strengthen the performance and scalability of our distributed training infrastructure. As a key member of our team, you'll work closely with researchers to develop and execute large-scale training runs, and contribute to building tools that make distributed training more efficient and accessible. **Responsibilities:**
• Collaborate with researchers to enable them to develop systems-efficient models and architectures
• Apply the latest techniques to our internal training runs to achieve impressive hardware efficiency for our training runs
• Create tooling to help researchers distribute their training jobs more effectively
• Profile and optimize our training runs
• *Qualifications:**
• Experience with large-scale ML training pipelines and distributed training frameworks
• Strong software engineering skills in Python
• Passion for diving deep into systems implementations and understanding fundamentals to improve their performance and maintainability
• Experience improving resource efficiency across distributed computing environments by leveraging profiling, benchmarking, and implementing system-level optimizations
• *Benefits:**
As a Machine Learning Systems Engineer at Susquehanna, you'll be part of a global quantitative trading firm that combines deep research, cutting-edge technology, and a collaborative culture. You'll play a critical role in shaping the future of AI at Susquehanna, enabling research at scale, accelerating experimentation, and helping unlock new opportunities across the firm

Responsibilities

  • As a key member of our team, you'll work closely with researchers to develop and execute large-scale training runs, and contribute to building tools that make distributed training more efficient and accessible
  • Collaborate with researchers to enable them to develop systems-efficient models and architectures
  • Apply the latest techniques to our internal training runs to achieve impressive hardware efficiency for our training runs
  • Create tooling to help researchers distribute their training jobs more effectively
  • Profile and optimize our training runs
  • Passion for diving deep into systems implementations and understanding fundamentals to improve their performance and maintainability
  • Experience improving resource efficiency across distributed computing environments by leveraging profiling, benchmarking, and implementing system-level optimizations

Qualifications

  • Experience with large-scale ML training pipelines and distributed training frameworks
  • Strong software engineering skills in Python

Track your job applications with Mokaru

Save jobs, track applications, and let AI tailor your resume for each position.

Similar jobs

Ready to land your next role?

Join thousands of professionals who use Mokaru to manage their job search. AI-powered resume tailoring, application tracking, and more.

Create Free Resume