Tellius is a fast-growing AI analytics company headquartered in Reston, Virginia. We are transforming how enterprises interact with their data — moving beyond dashboards and static reports to an always-on, agentic intelligence platform that investigates, reasons, and delivers finished answers autonomously.

Our platform, powered by Kaiya — Tellius's AI agent — connects to all your enterprise data, automatically investigates what changed and why, and delivers board-ready briefings, memos, and presentations before you even ask. We serve some of the world's most innovative companies, including Novo Nordisk, PepsiCo, Regeneron, AbbVie, and Biogen — helping their teams go from question to decision in minutes, not days.

We are recognized as a Visionary in the Gartner Magic Quadrant for Analytics & Business Intelligence Platforms, and we're just getting started. Our team is built for people who thrive on solving hard problems, moving fast, and building things that matter at enterprise scale.

About the Role

We are looking for a Senior DevOps Engineer to own the infrastructure that powers Tellius at scale. This isn't a maintenance role — it's a builder role. You will be responsible for the reliability, performance, and security of the cloud and Kubernetes infrastructure that our AI agents, analytical engines, and enterprise customer deployments run on, 24/7.

As we scale our agentic platform — including AI workloads, LLM-backed investigation pipelines, and high-throughput data processing — the infrastructure underneath needs to be bulletproof. You will set architectural direction, drive automation, and ensure our platform can grow without breaking.

This is a senior, hands-on position. You will partner closely with engineering, QA, and product teams to make releases faster and safer — and you will mentor junior engineers, raising the bar on operational standards across the team.

Key Responsibilities

Infrastructure & Platform Ownership

Own the architecture, scalability, and reliability of cloud infrastructure across AWS (preferred), Azure, or GCPManage and automate infrastructure using Terraform and IaC practicesAdminister and harden containerized workloads using Docker and Kubernetes, including cluster lifecycle, scaling, and production troubleshooting

Customer Deployment & Enablement

Lead private Kubernetes deployments of Tellius on customer-managed cloud environments, working directly with customer IT and infrastructure teams. Manage the full deployment lifecycle — upgrades, health checks, and troubleshooting — across diverse multi-cloud environments. Act as the primary technical point of contact during customer deployments, navigating varied security, compliance, and networking requirements

Delivery & Automation

Design, build, and continuously improve CI/CD pipelines for frequent, zero-downtime deployments. Implement and mature GitOps workflows using ArgoCD or FluxCD. Automate routine operational tasks through scripting in Shell or Python

Reliability, Security & Observability

Build and maintain observability using AI-assisted AIOps tools alongside Prometheus, Grafana, and ELK. Leverage AI-driven log analysis to reduce alert noise, predict failures, and accelerate root cause identification. Implement robust backup, disaster recovery, and security practices. Strengthen network security, load balancing, and TLS management across production systems

Collaboration & Leadership

Mentor junior DevOps engineers through code review, pairing, and documentation. Collaborate with engineering and QA to streamline release cycles and improve delivery velocity. Own production incidents, root-cause analysis, and long-term reliability improvements

Required Skills & Qualifications

3–7 years of hands-on experience in a DevOps or SRE role, including time in a senior or lead capacity.
Strong, production-grade experience with at least one major cloud provider (AWS preferred).
Deep experience managing, scaling, and troubleshooting Docker and Kubernetes clusters in production.
Solid working knowledge of Terraform and Infrastructure as Code practices.
Strong scripting and automation skills in Shell and/or Python.
Hands-on experience with monitoring and observability stacks (Prometheus, Grafana, ELK, or similar).
Solid understanding of networking fundamentals—network security, load balancing, TCP/IP, and SSL/TLS.
Strong command of Linux/Unix operating systems.
Excellent interpersonal and communication skills, with the ability to work across all organizational levels.
Hands-on experience implementing GitOps with ArgoCD or FluxCD.
Prior experience mentoring or leading junior engineers.

What We Offer

Tellius offers a highly competitive compensation package commensurate with your experience, along with full benefits and continual career and compensation growth. You will work on meaningful, large-scale infrastructure problems with a team that values ownership, curiosity, and impact.

Senior DevOps Engineer

Job description

About the Role

Key Responsibilities

Infrastructure & Platform Ownership

Customer Deployment & Enablement

Delivery & Automation

Reliability, Security & Observability

Collaboration & Leadership

Required Skills & Qualifications

What We Offer

Explore more

Career resources

Similar jobs

HPC Software Development

Software Development Manager

Software Development Engineer III (Java Backend)

Full Stack Software Developer

Software Developer in Test (Python)

Software Developer 3