sita
Lead Site Reliability Engineer/ Expert
Job description
Overview WELCOME TO SITA At SITA, we keep airports moving, airlines flying smoothly, and borders open. Our technology and communication innovations power the success of the global air travel industry. You'll find us in 95% of international airports, working closely with over 2,500 transportation and government clients. Each partnership brings unique challenges, and we thrive on delivering fresh solutions and cutting-edge tech to keep operations running like clockwork. We don't just move the world forward-we're proud to be recognized as a Great Place to Work® by 79% of our employees and certified in most of our growing locations. Here, we feel empowered, supported, and inspired to grow. Are you ready to love your job? The adventure begins right here, with you, at SITA. ABOUT THE ROLE & TEAM Responsible for ensuring highly reliable, scalable, and resilient production systems across cloud and on‑prem environments. Ensures high availability, disaster recovery readiness, and continuous improvement of service performance. Leads automation initiatives for provisioning, deployment, monitoring, and self‑healing to reduce manual effort and improve stability. Owns the event catalog, operational readiness, and reliability engineering practices to prevent recurrence of incidents and strengthen system resilience. Drives collaboration across Product, Engineering, T&E ICE, and Service Support Architects to ensure provider‑grade reliability and seamless operational integration of new releases. WHAT YOU’LL DO Reliability Engineering Design & maintain resilient systems ensuring high availability, scalability, and fault tolerance. Ensure effective Disaster Recovery (DR), failover strategies, and resilience engineering across environments. Improve platform reliability, observability, and performance across cloud and on‑premises systems. Establish and maintain SLIs, SLOs, and error budgets to measure and govern service reliability. Take ownership of production availability, capacity planning, performance tuning, and long‑term reliability initiatives. Automation, DevOps & NetOps Drive automation for infrastructure provisioning, deployment, monitoring, and operational workflows. Develop and implement auto‑remediation and self‑healing solutions to reduce manual intervention. Manage CI/CD pipelines and Infrastructure as Code (IaC) frameworks for secure, repeatable deployments. Implement and manage zero‑downtime deployment strategies (blue‑green, canary, rolling). Support containerized and cloud‑native platforms including Kubernetes, Docker, and distributed systems. Support NetOps tooling and network observability, ensuring visibility into network performance, events, and operational health. Incident, Problem & Event Management Perform incident management, production troubleshooting, and lead RCA/PMIR (Postmortem) for critical outages. Proactively identify reliability gaps, performance bottlenecks, and operational risks. Optimize incident, event, and problem management processes to reduce MTTR and improve operational efficiency. Define and maintain the event catalog, thresholds, and remediation workflows. Develop event response protocols and ensure teams are trained for rapid incident handling. Observability & Monitoring Build and maintain observability solutions using monitoring, logging, tracing, and alerting platforms. Implement APM, distributed tracing, and proactive alerting to detect issues early. Integrate network telemetry and NetOps monitoring tools into the overall observability stack. Collaborate with stakeholders to improve event coverage and post‑event learning. Experience with AI‑assisted observability, anomaly detection, and predictive alerting. Deployment & Operational Readiness Own the quality of new release deployments for the PSO. Conduct operational readiness assessments and manage deployment risk. Ensure supportability for new applications, platform releases, and infrastructure changes. Coordinate with internal/external stakeholders to drive continuous service improvement. Cross‑Functional Collaboration Work closely with Development, Platform Engineering, Product, T&E ICE, and Service Support Architects to embed reliability best practices. Collaborate with vendors and engineering teams to enhance system reliability and operational excellence. Support new product productization as SGS technical expert and ensure operational readiness. business value Who you are : Education and Professional Qualifications: • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field. Master’s degree preferred for senior roles. • Relevant certifications such as ITIL, CCNP/CCIE, Palo Alto Security, SASE, SDWAN, Juniper Mist/Aruba, CompTIA Security+, or Certified Kubernetes Administrator (CKA). • Certifications in cloud platforms (AWS, Azure, Google Cloud) or DevOps methodologies. • Certifications in automation and IaC tools (Ansible, Terraform). • Certifications in observability and monitoring platforms (Dynatrace, Prometheus, Grafana, ELK). • Certifications in ServiceNow, Jira, or other operational tooling. Experience: • 8+ years in IT operations, service management, or infrastructure reliability, including roles such as Site Reliability Engineer, Problem Manager, or DevOps Engineer. • Strong experience with high availability systems, resilience engineering, and DR readiness. • Deep expertise in RCA, incident management, PMIR, and implementing permanent fixes for recurring issues. • Hands on experience with CI/CD, automation, IaC, and self healing/auto remediation workflows. • Proficiency in observability platforms (APM, logging, tracing, alerting) and integrating network telemetry / NetOps monitoring. • Experience defining and governing SLIs, SLOs, and error budgets to improve service reliability. • Experience with Kubernetes, containerized workloads, and distributed systems. • Experience managing deployments, operational readiness, risk assessments, and improving event/problem management processes. • Strong cross functional collaboration with Development, Operations, Engineering, Product, T&E ICE, and SSA. • Familiarity with cloud platforms, scalable architectures, and zero downtime deployment strategies. Technical Skills: Cloud Infrastructure — AWS/Azure, Linux, virtualization, HA/DR architecture. Automation & IaC — Ansible, Terraform, CI/CD pipelines, self‑healing workflows. Observability & Monitoring — APM, logging, tracing, alerting, Dynatrace, Prometheus, Grafana, ELK. NetOps Monitoring — network telemetry, event monitoring, and operational visibility tools. Containerization & Orchestration — Docker, Kubernetes, distributed systems. Deployment & Release Engineering — zero‑downtime strategies (blue‑green, canary), operational readiness. Programming & Scripting — Python, Bash, PowerShell for automation and tooling. Reliability Engineering — SLIs/SLOs, error budgets, capacity planning, performance tuning. Qualifications WHAT WE OFFER We're all about diversity. We operate in 200 countries and speak 60 different languages and cultures. We're really proud of our inclusive environment. Our offices are comfortable and fun places to work, and we make sure you get to work from home too. Find out what it's like to join our team and take a step closer to your best life ever. 🏡 Flex Week: Work from home up to 2 days/week (depending on your team's needs) ⏰ Flex Day: Make your workday suit your life and plans. 🌎 Flex-Location: Take up to 30 days a year to work from any location in the world. 🌿 Employee Wellbeing: We have got you covered with our Employee Assistance Program (EAP), for you and your dependents 24/7, 365 days/year. We also offer Champion Health - a personalized platform that supports a range of wellbeing needs. 🚀 Professional Development: At SITA, we believe growth fuels innovation. Our learning ecosystem offers access to world-class platforms and programs designed to help you thrive. From LinkedIn Learning, Microsoft's Enterprise Skills Initiative, and Airport Council International -available to all employees-to specialized solutions like Pluralsight for technology upskilling, Harvard Business Publishing for people leadership, Stanford for strategic development and many others, we align learning opportunities with your Development Plan and our business priorities. Your development journey is supported every step of the way. 🙌 Competitive Benefits: Competitive benefits that make sense with both your local market and employment status. SITA is an Equal Opportunity Employer. We value a diverse workforce. In support of our Employment Equity Program, we encourage women, aboriginal people, members of visible minorities, and/or persons with disabilities to apply and self-identify in the application process. WHAT WE OFFER We're all about diversity. We operate in 200 countries and speak 60 different languages and cultures. We're really proud of our inclusive environment. Our offices are comfortable and fun places to work, and we make sure you get to work from home too. Find out what it's like to join our team and take a step closer to your best life ever. 🏡 Flex Week: Work from home up to 2 days/week (depending on your team's needs) ⏰ Flex Day: Make your workday suit your life and plans. 🌎 Flex-Location: Take up to 30 days a year to work from any location in the world. 🌿 Employee Wellbeing: We have got you covered with our Employee Assistance Program (EAP), for you and your dependents 24/7, 365 days/year. We also offer Champion Health - a personalized platform that supports a range of wellbeing needs. 🚀 Professional Development: At SITA, we believe growth fuels innovation. Our learning ecosystem offers access to world-class platforms and programs designed to help you thrive. From LinkedIn Learning, Microsoft's Enterprise Skills Initiative, and Airport Council International -available to all employees-to specialized solutions like Pluralsight for technology upskilling, Harvard Business Publishing for people leadership, Stanford for strategic development and many others, we align learning opportunities with your Development Plan and our business priorities. Your development journey is supported every step of the way. 🙌 Competitive Benefits: Competitive benefits that make sense with both your local market and employment status. SITA is an Equal Opportunity Employer. We value a diverse workforce. In support of our Employment Equity Program, we encourage women, aboriginal people, members of visible minorities, and/or persons with disabilities to apply and self-identify in the application process.


