Forbes Advisor

Forbes Advisor

SRE Manager

Role

SRE Manager

Job type

Full-time

Posted

7 hours ago

Share this job

Salary

Not disclosed by employer

Job description

WHAT YOU’LL DO: 

  • Lead and manage production & non-production support ensuring high availability and system reliability
  • Drive SRE best practices including incident management, root cause analysis, and continuous improvement Assume ownership of major incidents and drive coordinating efforts to ensure quick resolution of impacting events.
  • Collaborate with SRE team members for design and development of observability practices like Dashboarding, Logging, Metrics, Tracing, etc. They aim to diagnose and troubleshoot issues proactively.
  • Collaborate with SRE team members to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLOs and SLAs.
  • Identify and remove blockers, escalate appropriately, and continuous momentum of troubleshooting efforts.
  • Ensure adherence to established incident management processes and protocols.
  • Contribute to the improvement of incident response runbooks and documentation.
  • Own internal and external communications during major incidents.
  • Translate technical details into business-impact language (scope, severity, risk, ETA, confidence level).
  • Maintain clear and continuous communication with stakeholders during incidents, providing timely updates.
  • Ensure safe execution of mitigations, rollbacks, feature flags, and failovers
  • Lead post incident review meetings with stakeholders to confirm event details and assign problem investigators.
  • Track and report on incident metrics, identifying patterns and areas for systemic improvement.
  • Augment Change Managers and / or Problem Managers as required in the performance of those responsibilities.

WHAT YOU’VE DONE: 

  • Bachelor’s or master’s Degree and/or equivalent experience relevant to functional area. 
  • 12+ years of experience in SRE / DevOps
  • 5+ years of working experience as a Site Reliability Engineer
  • Experience managing critical incidents in a 24/7 production environment.
  • Experience with ServiceNow ITSM and on‑call incident coordination via PagerDuty / Zen duty (or comparable ITSM/on‑call tools).

 

Knowledge, Skills, Abilities & Behaviours

  • Understand a wide breadth of technical concepts across SRE practices
  • Background in cloud-based systems and SRE practices is a must.
  • Experience in at-least one Observability platform like New Relic, Datadog, etc. preferred.
  • Ability to use AI tools to synthesize communication, reports, and troubleshooting leads.
  • Certification in AWS, ITIL, or related frameworks preferred.
  • Experience in SaaS or technology product companies preferred.
  • Strong leadership and decision-making skills under pressure.
  • Excellent verbal and written communication skills for both technical and non-technical audiences.
  • Ability to manage multiple priorities and deadlines in high-stakes situations.
  • Strong analytical skills to drive root cause analysis and trend identification.
  • Familiarity with modern monitoring and incident management tools.
  • Demonstrated ability to build consensus across diverse teams.
  • Effective at maintaining calm and focus during critical situations.
  • Knowledge of cloud infrastructure (e.g., AWS, Azure) and application architecture.
  • Proven track record of improving incident management processes.
  • Attention to detail in documentation and follow-through.
  • Adept at facilitating collaboration across remote and global teams.
  • Proactive in identifying operational risks and implementing preventive measures.
  • Committed to continuous learning and process improvement.
  • Ethical, dependable, and resilient in challenging scenarios.

● Day off on the 3rd Friday of every month (one long weekend each month)
● Monthly Wellness Reimbursement Program to promote health well-being
● Monthly Office Commutation Reimbursement Program
● Paid paternity and maternity leaves

Resume ExampleCover Letter Example

Explore more