Sentinellabs
Staff Infrastructure Engineer — Observability
Job description
Our Purpose
At SentinelOne, we are driven by a clear purpose: to give the advantage to those who secure our future. As AI reshapes how organizations build, operate, and innovate, the responsibility to protect them becomes more critical than ever. When you join SentinelOne, your work helps protect global enterprises, critical infrastructure, and the technologies shaping tomorrow. If you are motivated by meaningful challenges and want your impact to be real, measurable, and global, you will find purpose here.
About Us
SentinelOne is a company at the intersection of AI and security, pioneering a new operating model for cybersecurity. Our AI-native platform unifies protection across endpoint, cloud, identity, data, and AI systems to deliver autonomous detection and response with clarity and speed. By combining real-time analytics, intelligent automation, and a unified data foundation, we reduce noise, simplify complexity, and empower security teams to focus on what truly matters.
Our teams are builders, problem-solvers, and innovators committed to shaping the future of security. If you are excited to solve hard problems alongside talented, mission-driven people, we invite you to help us build a safer future for humanity.
What Are We Looking For?
We’re looking for people who are relentlessly curious and committed to continuous learning. AI is reshaping every function across our business, and we enable every team member, regardless of role or level, to build fluency in AI tools and concepts. Those who thrive here actively seek out new solutions, experiment thoughtfully, and apply what they learn to drive better, faster, smarter outcomes.
As a Staff Infrastructure Engineer, you'll be a pivotal technical leader and architect within our Observability team, driving strategic initiatives and shaping the future of our critical systems. You will leverage your deep expertise to design, implement, and optimize solutions that underpin SentinelOne's global platform, directly empowering engineering teams across the organization.
We are seeking a candidate who is driven by a deep passion for observability and technical leadership. Imagine architecting the core systems that provide SentinelOne with real-time, global visibility, delivering actionable platform insights precisely when they are needed. In this high-impact role, you'll design and implement robust, secure solutions for high-volume data ingestion, storage, and analysis—fundamentally shaping how we understand and optimize our platform health. This is your chance to take end-to-end ownership of critical infrastructure, mentor talented engineers, and profoundly accelerate software delivery across our entire engineering organization.
Due to Federal Government contract requirements, U.S. Citizenship is required for this position.
FedRAMP staff may be subject to customer or third party background checks up to and including Secret Clearance if required by their role at SentinelOne.
What Will You Do?
Primary responsibilities include:
- Architect and implement robust, scalable telemetry platforms that empower SentinelOne engineers to deploy and monitor features with speed, safety, and reliability.
- Act as the primary Subject Matter Expert (SME) and administrator for our core observability stack, including Grafana, Prometheus, Thanos/Mimir/Cortex, and OpenTelemetry (OTEL) pipelines.
- Partner strategically with diverse engineering teams across the organization to define platform requirements, ensuring the observability ecosystem evolves ahead of stakeholder needs.
- Take complete ownership of critical features, from initial architectural design and requirements refinement through to production deployment and operational maturity.
- Drive exemplary operational efficiency for critical observability services across AWS and GCP, meticulously balancing unwavering system reliability with smart cloud cost-optimization.
- Build robust automation and self-service tooling to drastically reduce operational toil, optimize resource utilization, and minimize pager fatigue.
- Drive the deployment, maintenance, and compliance of observability systems in critical, high-security environments, including FedRAMP and air-gapped deployments.
- Cultivate platform transparency and reliability by rigorously implementing IaC (Terraform/Ansible) and standardizing industry best practices.
- Elevate engineering quality by mentoring team members, leading comprehensive technical design and code reviews, and providing constructive feedback that fosters growth.
- Lead the swift resolution of highly complex production incidents, perform thorough root-cause analyses, and participate in on-call rotations to ensure peak system integrity.
What Skills and Knowledge Should You Bring?
Ideal candidates will have
- 8+ years experience in Infrastructure Engineering, Site Reliability Engineering (SRE), or a related systems-focused field.
- 8+ years experience in architecting, scaling, and managing enterprise-grade observability stacks utilizing Prometheus, Grafana, Thanos (or Mimir/Cortex), and OpenTelemetry (OTEL).
- Experience design-engineering cloud-native infrastructure within major cloud providers (AWS or GCP) and managing production Kubernetes environments (EKS, GKE).
- Advanced proficiency with IaC and automation tools, specifically Terraform and Ansible, to manage immutable infrastructure.
- Experience maintaining and optimizing high-throughput, large-scale distributed systems with a focus on cost-efficiency, scalability, and disaster recovery.
- Demonstrated ability to lead complex technical designs, mentor other engineers, and collaborate cross-functionally with product and application teams.
- US Citizenship and the ability to work in a government-regulated environment.
Preferred Qualifications
- 8+ years production-level programming experience in GoLang (highly desirable) or another mainstream language (e.g., Python, Java) with a strong willingness to adopt GoLang.
- Experience working with high-security compliance frameworks, specifically FedRAMP or other sovereign cloud requirements.
- Familiarity with the unique operational challenges of on-premises, hybrid, or air-gapped Kubernetes deployments.
- Experience designing advanced CI/CD pipelines (e.g., GitHub Actions) and implementing sophisticated deployment strategies (canary, blue-green, rolling updates).
Why SentinelOne?
AI is redefining how the world operates and rewriting the rules of security in real time, and SentinelOne was built for this moment. From day one, we architected an AI-native platform designed to operate at machine speed, not as an add-on to legacy systems but as the foundation itself. If you want to build where innovation and impact move together, this is that place.
We invest in our Sentinels with comprehensive, competitive benefits designed to support you and your family:
Equity & Rewards
- Restricted Stock Units (RSUs)
- Employee Stock Purchase Plan (ESPP)
Time Off & Wellbeing
- Flexible time off
- Paid company holidays and paid sick time
- Gender-neutral parental leave
- Grandparent leave
Insurance & Financial Security
- Medical, dental, and vision coverage
- 401(k) retirement plan with company match
- Life and disability insurance
- Health and dependent care FSA
- Voluntary benefits (hospital, accident, critical illness)
- Employee Assistance Program (EAP)
- ARAG pre-paid legal
- Nationwide pet insurance
- Cancer Care program
- Global business travel medical insurance
Work Perks & Flexibility
- Home office allowance
- Mobile phone reimbursement
Wellness & Lifestyle
- Wellness coach
- Wellness/gym reimbursement
- Fertility coverage
- Adoption & surrogacy reimbursement
This U.S. role has a base pay range that will vary based on the location of the candidate. For some locations, a different pay range may apply. If so, this range will be provided to you during the recruiting process. You can also reach out to the recruiter with any questions.
SentinelOne is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.
SentinelOne participates in the E-Verify Program for all U.S. based roles.


