MCPNew: now works with Claude & AI assistants
Ifm-us

Ifm-us

HPC Engineer

Company

Ifm-us

Role

HPC Engineer

Location

US

Job type

Full-time

Found on Mokaru

3 weeks ago

Share this job

Salary

$150k - $300k

Job description

Responsibilities

  • Monitor health, performance, and availability of large-scale GPU clusters.
  • Respond to incidents and perform first-level triage.
  • Support researchers and troubleshoot job failures.
  • Execute operational runbooks and recovery procedures.
  • Validate cluster deployments, upgrades, and maintenance activities.
  • Track infrastructure utilization and operational metrics.
  • Develop automation and monitoring tools.
  • Contribute to documentation and reporting.

Education Bachelor's degree in Computer Science, Computer Engineering, Software Engineering, Information Technology, Electrical Engineering, Mathematics, Physics, or related disciplines.

Experience

  • 2+ years in Linux systems administration, SRE, DevOps, cloud operations, HPC, or infrastructure operations.
  • Strong Linux troubleshooting skills.
  • Experience with scripting using Python or Bash.

Preferred Qualifications

  • Slurm.
  • GPU infrastructure.
  • AWS, Azure, or GCP.
  • Grafana, Prometheus, Datadog, or similar tools.
  • Containers and Kubernetes.
  • AI/ML infrastructure exposure.
  • Research computing environments.
Resume ExampleCover Letter Example

Explore more