Sigmasoftware2
Principal Site Reliability Engineer
Salary
Job description
- Define and lead infrastructure and reliability strategy across the platform
- Design scalable, resilient systems in collaboration with engineering teams
- Optimize build, testing, and deployment processes for speed and stability
- Establish and uphold best practices for CI/CD, monitoring, and observability
- Lead incident response and drive continuous improvement post‑incident
- Automate workflows to reduce operational toil and risk
- Mentor engineers and foster a culture of operational excellence
- Make strategic build‑vs‑buy decisions balancing speed, quality, and sustainability
- At least 8 years of experience in Site Reliability Engineering or DevOps roles, including 2+ years in a Principal or Lead position
- Proven experience in infrastructure modernization and scaling initiatives for high‑growth environments
- Strong proficiency in Python
- Deep expertise in cloud platforms and container orchestration tools such as AWS ECS and EKS
- Solid experience in CI/CD pipeline design and optimization using tools like GitHub Actions and Buildkite
- Proficiency in infrastructure‑as‑code tools such as Terraform
- Strong knowledge of monitoring, observability, and performance optimization practices
- Upper-Intermediate level of spoken and written English
WOULD BE A PLUS
- Experience with monorepos (Turborepo, pnpm)
- Familiarity with modern TypeScript tools (swc, biome, oxc)
- Knowledge of NestJS, NextJS, and testing frameworks (Jest, Vitest)
PERSONAL PROFILE
- Excellent leadership, communication, and decision‑making abilities
- Ability to work independently and make pragmatic build‑vs‑buy decisions in fast‑paced environments


