oneadvanced
Senior SRE Engineer
Company
Role
Senior SRE Engineer
Location
Job type
Other
Found on Mokaru
14 hours ago
Salary
Job description
Join OneAdvanced
We are looking for a Senior SRE Engineer with deep expertise in Kubernetes, Amazon EKS , and cloud platform engineering to lead the evolution of our core engineering platform. This role is responsible for ensuring the scalability, reliability, security, and operational excellence of the EKS infrastructure that underpins our CI/CD ecosystem, including the Harness delegate platform used by development teams to build and deploy software.
The successful candidate will drive platform automation , infrastructure standardisation , and toolchain optimisation, reducing operational overhead while enabling teams to deliver software faster and more reliably. As a technical leader, they will establish best practices for platform governance, observability, resilience, and developer experience across the organisation. This is a pivotal role in maintaining and enhancing the platform that supports how the business ships code today and scales for future growth.
What You Will Do
Cloud Platform & Infrastructure Engineering
- Design, build, and operate scalable, secure, and highly available cloud infrastructure across AWS, Azure and GCP
- Own and manage core platform services including networking, IAM, compute, storage, DNS, load balancing, and private connectivity.
- Implement cloud governance, operational standards, security controls, and cost optimisation practices across multi-cloud environments.
- Support enterprise-scale platform reliability, availability, and scalability objectives.
Kubernetes & Amazon EKS Platform Ownership
- Own and operate production Amazon EKS clusters that underpin the organisation's CI/CD and software delivery platforms.
- Manage Kubernetes lifecycle activities including upgrades, patching, scaling, capacity management, monitoring, and troubleshooting.
- Administer cluster resources , node groups, Karpenter, autoscaling, ingress controllers, namespaces, RBAC, networking, storage, and secrets management.
- Drive platform scalability, resilience, and operational excellence for containerised workloads and microservices.
- Troubleshoot complex Kubernetes issues including scheduling, networking, DNS, performance, and cluster reliability.
Platform Automation & Infrastructure as Code
- Develop, maintain, and standardise reusable Infrastructure as Code (IaC) components using Terraform and other automation frameworks.
- Build self-service platform capabilities that simplify infrastructure provisioning and operational workflows for engineering teams.
- Automate infrastructure lifecycle management, cloud operations, and platform maintenance activities.
- Promote infrastructure standardisation and governance through reusable modules, templates, and platform engineering practices.
CI/CD, Toolchain & Developer Platform Engineering
- Build, support, and optimise enterprise CI/CD platforms including Harness, GitHub Actions, Jenkins, and related toolchains.
- Maintain the EKS infrastructure supporting Harness delegates and deployment services used across the organisation.
- Improve deployment reliability, software delivery velocity, and operational efficiency through automation and platform enhancements.
- Support Internal Developer Platform (IDP) initiatives, including Backstage, to provide self-service capabilities for development teams.
- Drive toolchain standardisation and adoption of DevOps best practices across engineering teams.
Observability, Reliability & Operational Excellence
- Implement and maintain observability platforms using Grafana, Prometheus, CloudWatch, and centralised logging solutions.
- Develop dashboards, metrics, alerting strategies, and service health monitoring for cloud infrastructure, Kubernetes platforms, and application services.
- Lead incident response, root cause analysis, and continuous reliability improvements.
- Support production environments and critical platform services, ensuring high availability and operational resilience.
Security, Governance & Collaboration
- Implement DevSecOps practices including IAM, secrets management, vulnerability remediation, encryption, and infrastructure hardening.
- Ensure compliance with organisational security standards and cloud governance policies.
- Collaborate closely with engineering, operations, security, and product teams to deliver scalable and secure platform capabilities.
- Maintain technical documentation, runbooks, architecture diagrams, and operational procedures.
Leadership & Continuous Improvement
- Mentor engineers and promote platform engineering, SRE, and DevOps best practices across the organisation.
- Drive continuous improvement initiatives focused on automation, developer experience, reliability, scalability, and operational efficiency .
- Evaluate emerging technologies and platform capabilities to enhance the engineering ecosystem.
What You Will Have
- 4-5 years+ of experience in Site Reliability Engineering (SRE), DevOps, Platform Engineering, or Infrastructure Engineering roles.
- Strong hands-on experience with AWS and Azure and GCP cloud platforms
- Deep expertise in Amazon EKS and Kubernetes administration, architecture, troubleshooting, upgrades, and operational management.
- Strong understanding of Kubernetes internals, container orchestration, Karpenter, Helm, and cloud-native platform operations.
- Experience building and operating enterprise-scale CI/CD platforms using Harness, GitHub Actions, Jenkins, or similar technologies.
- Proven experience supporting deployment platforms and toolchains that enable software delivery across engineering organisations.
- Advanced Infrastructure as Code (IaC ) experience using Terraform, CloudFormation, and infrastructure automation frameworks.
- Experience with configuration management and automation tools such as Ansible.
- Strong scripting and automation skills using Bash, Python, or Go.
- Hands-on experience with observability and monitoring platforms including Grafana, Grafana Cloud, Prometheus, CloudWatch, ELK/OpenSearch, or similar technologies.
- Experience implementing cloud security controls, IAM, secrets management, infrastructure hardening, and DevSecOps practices.
- Experience with GitOps methodologies.
- Experience with Internal Developer Platforms (IDP), Backstage, and platform engineering concepts is preferred.
- Strong Linux administration, troubleshooting, and production support experience.
- Excellent communication, stakeholder management, and cross-functional collaboration skills.
- Demonstrated automation-first mindset with a strong focus on scalability, reliability, and operational excellence.
- AWS, Azure, and Certified Kubernetes Administrator (CKA) certifications are highly desirable.
#LI-PB1
What We Do For You
- Wellbeing focused – Our people are our greatest assets, and ensuring everyone feels their best self to come to work is integral.
- Annual Leave – 20 days of annual leave, plus public holidays
- Employee Assistance Programme – Free advice, support, and confidential counselling available 24/7.
- Personal Growth - Regardless of where you are at in your career, we’re committed to enabling your growth personally and professionally
- Development Programmes – From Future Managers to Leadership Training, our development programmes help you get where you need to go
- Online Learning Platform: SkillsHub! - Learning at your fingertips, anytime from anywhere. You can access our online library with relevant content for your career growth.
- Life Insurance - 3x annual salary
- Personal Accident Insurance - providing cover in the event of serious injury/illness.
- Performance Bonus – Our Group-wide bonus scheme enables you to reap the rewards of your success
Who We Are
At OneAdvanced, we are at the forefront of delivering sector-focused technology solutions that simplify complexity, drive meaningful progress, and help build a fairer, more inclusive society.
We’re much more than a software company. We deliver SaaS workflow applications and IT services that power organisations across Education, Government, Healthcare, Legal, Manufacturing, Housing, Retail, and more.
OneAdvanced is one of the UK’s largest business software and services companies. Based in Birmingham (The Mailbox), operating across the UK, Ireland, India, and Australia.
Our secure, scalable platform, including OneAdvanced AI, our private AI service for UK organisations, powers connectivity and innovation across critical sectors. Alongside our software are our IT services, including hosting, managed services, and application modernisation.
We strive to create an inclusive workplace that drives innovation and collaboration, championing diverse perspectives and ideas. Our Environmental, Social and Governance (ESG) strategy is embedded in everything we do, guiding us to create meaningful impact for our people, our customers and the planet.
Join us and become part of a team that’s powering the world of work and making a real difference.
Learn more at www.oneadvanced.com


