Join Our Team

Oowlish, one of Latin America's rapidly expanding software development companies, is seeking experienced technology professionals to enhance our diverse and vibrant team.

As a valued member of Oowlish, you will collaborate with premier clients from the United States and Europe, contributing to pioneering digital solutions. Our commitment to creating a nurturing work environment is recognized by our certification as a Great Place to Work, where you will have opportunities for professional development, growth, and a chance to make a significant international impact.

We offer the convenience of remote work, allowing you to craft a work-life balance that suits your personal and professional needs. We're looking for candidates who are passionate about technology, proficient in English, and excited to engage in remote collaboration for a worldwide presence.

About the Role

We are seeking a DevOps & Site Reliability Engineer to join a growing AI-focused SaaS startup. In this role, you’ll be responsible for maintaining, optimizing, and scaling the infrastructure that supports our platform, ensuring high availability, performance, and reliability.

You’ll work closely with development and product teams to improve deployment processes, monitor systems, and respond to incidents proactively.

If you are passionate about DevOps culture, automation, and ensuring systems are always running smoothly, this is the perfect opportunity for you!

Responsibilities

Design, implement, and improve Site Reliability Engineering practices across production environments.
Define, manage, and continuously improve Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets.
Lead and participate in incident response and incident command processes.
Build and evolve observability strategies, including monitoring, logging, alerting, and distributed tracing.
Improve system reliability, availability, scalability, and operational efficiency.
Partner with engineering teams to improve application performance and production readiness.
Develop automation solutions that reduce operational overhead and improve reliability.
Participate in root cause analysis and post-incident reviews.
Drive continuous improvement initiatives based on operational insights and incident learnings.
Help establish reliability best practices across teams and services.

Requirements

5+ years of professional experience in Site Reliability Engineering, DevOps, or Production Engineering roles.
Strong understanding of Site Reliability Engineering principles and best practices.
Experience supporting and operating production systems at scale.
Strong knowledge of monitoring, observability, and reliability engineering concepts.
Experience working in cloud-based environments.
Strong troubleshooting and problem-solving skills.
Experience working with distributed systems and modern application architectures.

Must have

Proven Site Reliability Engineering experience.
Experience in defining and managing:
Service Level Objectives (SLOs)
Service Level Indicators (SLIs)
Error Budgets
Experience leading or actively participating in Incident Command and Incident Response processes.
Experience designing and implementing observability strategies.
Hands-on experience with:
Monitoring
Logging
Alerting
Distributed Tracing
Experience improving system reliability, availability, and operational excellence.
Experience supporting mission-critical production environments.
Experience with cloud platforms (AWS preferred).
Strong automation mindset.
Experience conducting root cause analysis and postmortems.

Nice to have

Kubernetes experience.
Terraform or Infrastructure as Code experience.
CI/CD pipeline experience.
Experience with containerized environments.
Experience with distributed microservices architectures.
Experience with performance engineering.
Experience mentoring engineers on reliability practices.
Multi-cloud experience.
Experience working in highly regulated or high-availability environments.

Benefits & Perks

Home office; Competitive compensation based on experience; Career plans to allow for extensive growth in the company; International Projects; Oowlish English Program (Technical and Conversational); Oowlish Fitness with Total Pass; Games and Competitions;

You can also apply here

Website: https://www.oowlish.com/work-with-us/ LinkedIn: https://www.linkedin.com/company/oowlish/jobs/ Instagram: https://www.instagram.com/oowlishtechnology/

Senior Site Reliability Engineer (SRE)

Job description

Explore more

Career resources

Similar jobs

System Test & Validation Engineer - Onboard Charger (OBC) (Qualitätsingenieur/in)

Integration Test & Validation Engineer – Telematik

Sr Validation Engineer

V.I.E Qualification & Validation engineer (M/F)

Test and Validation Engineer (Onsite)

Validation engineer-1