About the Role:

Grade Level (for internal use):

Who We Are

Kensho is S&P Global’s hub for AI innovation and transformation. With expertise in Machine Learning and data discovery, we develop and deploy novel solutions for S&P Global and its customers worldwide. Our solutions help businesses harness the power of data and Artificial Intelligence to innovate and drive progress. Kensho's solutions and research focus on Generative AI, LLM Agents, speech recognition, entity linking, document extraction, text classification, natural language processing, and more.

At Kensho, we hire talented people and give them the autonomy and support needed to build amazing technology and products. We collaborate using our teammates' diverse perspectives to solve hard problems. Our communication with one another is open, honest, and efficient. We dedicate time and resources to explore new ideas, but always rooted in engineering best practices. As a result, we can innovate rapidly to produce technology that is scalable, robust, and useful.

About the Role

As a Senior Site Reliability Engineer (SRE) at Kensho, you will be a hands-on technologist who combines strong infrastructure expertise with solid software engineering skills Python first. You will be responsible for ensuring the reliability, scalability, and security of both business-critical internal systems and external, customer facing services.

You will work closely with Infrastructure, Application, and Security teams to design resilient systems, automate operations, and continuously improve platform stability. This role requires deep ownership of production systems, strong troubleshooting skills across infrastructure, Container orchestration systems, networking, and applications, and comfort operating in a 24/7 on call environment.

What You’ll Do

Own and operate production services supporting critical financial applications with a strong focus on availability, performance, and reliability
Design, build, and manage AWS infrastructure, including EKS-based clusters, across lower and production environments
Provision and manage infrastructure using Terraform (Infrastructure as Code) with a strong automation first mindset
Deploy, scale, and troubleshoot applications running on like Kubernetes, including cluster creation, upgrades, and lifecycle management
Build and maintain automation frameworks and tooling Python based to reduce operational toil and prevent recurring incidents
Monitor system health using metrics, logs, and alerts; continuously tune alerts, dashboards, and runbooks
Troubleshoot complex issues spanning clusters, networking, certificates, deployments, and application behavior
Manage certificate lifecycle and expiration, ensuring secure and uninterrupted service operation
Collaborate with InfoSec, Vulnerability Management, and Network Security teams (e.g., Zscaler) to maintain a strong security posture
Collaborate with L1/L2 teams, helping them understand infrastructure and operational best practices
Participate in on call and lead incident response, drive root cause analysis, and ensure effective post-incident remediation and learnings
Identify architectural anti-patterns and drive improvements by reviewing new services for production readiness, resiliency, and secure design prior to release
Establish and enforce production readiness standards, including deployment strategies, rollback plans, and observability requirements
Optimize infrastructure cost and resource utilization without compromising reliability and performance

What We Look For

6+ years of experience in SRE, DevOps, Platform, or Infrastructure Engineering roles
Strong software engineering background, with hands-on Python development used for automation, tooling, and system reliability
Experience building or supporting scalable, distributed systems in production
Deep experience with AWS cloud environments, including AWS, IAM, networking, and access controls
Strong hands-on expertise with similar tools like Kubernetes (EKS preferred): cluster creation, deployments, scaling, and troubleshooting
Solid understanding of networking fundamentals (VPCs, routing, DNS, load balancing, security groups)
Experience with CI/CD pipelines, deployment tools, and infrastructure automation
Working knowledge of databases and query optimization, and understanding how applications behave under load
Familiarity with similar tools like Kafka or other messaging systems
Comfortable conducting code reviews and participating in coding focused interviews
Strong operational mindset with experience in incident management and on‑call rotations
Clear communicator and collaborative teammate who values documentation and knowledge sharing

How to Really Get Our Attention

Demonstrated ownership of large‑scale, production systems
Strong examples of Python based automation or internal tooling
Contributions to open‑source projects, infrastructure platforms, or reliability tooling
Experience working closely with security and compliance teams in regulated environments

Technologies We Like

AWS, Amazon EKS, Terraform, Jsonnet
Similar tools like Kubernetes, Helm, CI/CD tooling
Python (automation, tooling, reliability engineering)
Prometheus, Grafana, logging and monitoring platforms
PostgreSQL and other production databases
Similar tools like Kafka or event driven systems
Linux (Ubuntu or similar)

What’s In It For You?

Our Mission:

Advancing Essential Intelligence.

Our People:

We're more than 35,000 strong worldwide—so we're able to understand nuances while having a broad perspective. Our team is driven by curiosity and a shared belief that Essential Intelligence can help build a more prosperous future for us all.From finding new ways to measure sustainability to analyzing energy transition across the supply chain to building workflow solutions that make it easy to tap into insight and apply it. We are changing the way people see things and empowering them to make an impact on the world we live in. We’re committed to a more equitable future and to helping our customers find new, sustainable ways of doing business. Join us and help create the critical insights that truly make a difference.

Our Values:
Integrity, Discovery, Partnership

Throughout our history, the world's leading organizations have relied on us for the Essential Intelligence they need to make confident decisions about the road ahead. We start with a foundation of integrity in all we do, bring a spirit of discovery to our work, and collaborate in close partnership with each other and our customers to achieve shared goals.

Benefits:

We take care of you, so you can take care of business. We care about our people. That’s why we provide everything you—and your career—need to thrive at S&P Global.

Our benefits include:

Health & Wellness: Health care coverage designed for the mind and body.
Flexible Downtime: Generous time off helps keep you energized for your time on.
Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in class benefits for families.
Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.

For more information on benefits by country visit: https://spgbenefits.com/benefit-summaries

Global Hiring and Opportunity at S&P Global:

At S&P Global, we are committed to fostering a connected and engaged workplace where all individuals have access to opportunities based on their skills, experience, and contributions. Our hiring practices emphasize fairness, transparency, and merit, ensuring that we attract and retain top talent. By valuing different perspectives and promoting a culture of respect and collaboration, we drive innovation and power global markets.

Recruitment Fraud Alert:

If you receive an email from a spglobalind.com domain or any other regionally based domains, it is a scam and should be reported to reportfraud@spglobal.com. S&P Global never requires any candidate to pay money for job applications, interviews, offer letters, “pre-employment training” or for equipment/delivery of equipment. Stay informed and protect yourself from recruitment fraud by reviewing our guidelines, fraudulent domains, and how to report suspicious activity here.

-----------------------------------------------------------

Equal Opportunity Employer

S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment.

If you need an accommodation during the application process due to a disability, please send an email to: EEO.Compliance@spglobal.com and your request will be forwarded to the appropriate person. 

US Candidates Only: Know Your Rights: Workplace discrimination is illegal

-----------------------------------------------------------

20 - Professional (EEO-2 Job Categories-United States of America), BSMGMT203 - Entry Professional (EEO Job Group)

Senior Site Reliability Engineer - Infrastructure

Job description

About the Role:

Explore more

Similar jobs

Off-Shift Quality Engineer - Saginaw

Software Integration Quality Engineer

Project Quality Engineer (Remote Eligible, U.S.)

Customer Site Reliability Engineer - OpenShift Managed Cloud Services (Kubernetes/AWS/Azure, Linux, prefer Japanese)

Quality Engineer

Lead Supplier Quality Engineer