Ifs1
Senior Cloud Engineer – Containerization - Kubernetes & GitOps (IGT1)
Company
Role
Senior Cloud Engineer – Containerization - Kubernetes & GitOps (IGT1)
Location
Job type
Full-time
Posted
11 hours ago
Salary
Job description
The Cloud Engineer will build, operate, and improve Kubernetes‑based platforms supporting Rhapsody’s cloud services. You will own cluster reliability, GitOps‑driven deployments (Argo CD or similar), infrastructure‑as‑code with Terraform modules, and production monitoring using Grafana‑style dashboards. You’ll collaborate with SRE, Security, and Engineering to deliver resilient, observable, and cost‑aware services in a 24×7 environment.
Key Responsibilities
- Operate and harden Kubernetes clusters: upgrade/patch, node pools, CNI, ingress, certificates, autoscaling, quotas, RBAC, and multi‑env promotion.
- Implement and maintain GitOps workflows using Argo CD (or Flux): app definitions, health policies, sync strategies, drift detection, rollback.
- Standardize platform add‑ons via Helm/Kustomize (ingress, cert manager, secrets, log/metrics/traces agents).
- Build reusable Terraform modules (networking, cluster, storage, identity, observability) and enforce plan/apply and code‑review workflows.
- Create Python/Shell automation for cluster operations, validations, drift remediation, image promotion, capacity, and cost hygiene.
- Develop and tune Grafana‑style dashboards and alerts; reduce noise, improve MTTR, and document RCAs.
- Apply least‑privilege, secrets hygiene, image provenance, and policy controls; execute maintenance windows, patching, and upgrades.
- Keep runbooks/diagrams/SOPs current; contribute to knowledge base and mentor junior engineers.
- Collaborate with internal/external stakeholders during deployments, cutovers, and incidents; communicate trade‑offs and status clearly.
Required Qualifications
- 3–5+ years in Cloud/SRE/Platform Engineering supporting production systems.
- Hands‑on Kubernetes operations (cluster lifecycle, ingress, certs, autoscaling, RBAC, Helm/Kustomize).
- Experience with GitOps (Argo CD or Flux) and declarative release practices.
- Strong Terraform skills, including authoring and maintaining modules across environments.
- Monitoring experience with Grafana‑style dashboards and alerting; ability to define meaningful SLO/SLA signals.
- Proficient in Python and Shell; comfortable with Git and code reviews.
- Solid understanding of networking, security, containers, and Linux fundamentals.
- Experience in follow‑the‑sun/24×7 support with on‑call participation.
- Excellent written and verbal communication for global and customer‑facing work.
Preferred (Good to Have)
- Understanding of UI design and common UI tools for simple internal portals or operational views.
- Experience working with databases (performance basics, HA/failover, backups/restores).
- Fluency using AI tools as a companion for research, code review, documentation, or incident triage.
Shift & On‑Call Expectations
- Assigned shift aligned with global operations; occasional adjustments for maintenance/projects.
- Participation in rotational on‑call for P1/P2 events per local policy; precise handoffs and status updates.
Education
- College degree in Computer Science, Information Technology, or a related field preferred
- Demonstrated, relevant experience may be substituted for a degree
- Kubernetes certification (e.g., CKA/CKAD/CKS) a plus
We champion flexibility and hybrid work options to support varying lifestyles and personal needs. At the same time, we value the power of in-person collaboration to build community, spark innovation, and strengthen connections. Our approach ensures you can work in ways that suit you best while still engaging with colleagues to share ideas and grow together. #LI-Hybrid #LI-DNP


