Synechron
Site Reliability Engineer (SRE) – AWS + Docker
Company
Role
Site Reliability Engineer (SRE) – AWS + Docker
Location
India
Job type
Full-time
Found on Mokaru
Yesterday
Salary
Job description
Job Summary
Synechron is seeking a Site Reliability Engineer (SRE) to improve the reliability, scalability, and performance of cloud-native systems. This role supports production operations through AWS infrastructure management, containerized workload operations, CI/CD enablement, observability, and incident response. The position contributes to business goals by improving availability, reducing operational risk, and supporting cost-efficient system performance.
Software Requirements
Required
AWS: strong hands-on experience with EC2, ECS/EKS, IAM, VPC, ALB/NLB, Route 53, S3, CloudWatch
Docker
Container orchestration using EKS/Kubernetes or ECS
CI/CD using GitHub Actions, Jenkins, or Azure DevOps
IaC using Terraform or CloudFormation
Observability tools: CloudWatch, Prometheus/Grafana, ELK/OpenSearch, X-Ray
Automation using Python and/or Bash
Linux system administration and troubleshooting
Networking knowledge covering DNS, TCP/IP, TLS, security groups, NACLs
Preferred
Experience with CloudFront, RDS, ElastiCache, ASG
Blue/green and canary deployment strategies
Artifact management and release approval workflows
Vulnerability scanning and secrets management tools
Overall Responsibilities
Define and maintain SLOs, SLIs, SLAs, and error budgets
Build and manage AWS infrastructure for scalable, highly available systems
Operate containerized services using Docker and ECS/EKS/Kubernetes
Implement and optimize CI/CD pipelines and deployment strategies
Establish observability through metrics, logs, and traces
Automate infrastructure and operations using IaC and scripting
Manage incident response, runbooks, root-cause analysis, and remediation
Drive performance tuning, capacity planning, and cost optimization
Implement security best practices across infrastructure and deployments
Partner with development teams to improve reliability by design
Technical Skills (By Category)
Programming Languages
Essential: Python, Bash
Preferred: Scripting for operational automation and diagnostics
Databases / Data Management
Essential: Operational familiarity with RDS and ElastiCache in production environments
Preferred: Performance tuning and availability planning for managed data services
Cloud Technologies
Essential: AWS including EC2, ECS/EKS, IAM, VPC, ALB/NLB, Route 53, S3, CloudWatch
Preferred: CloudFront, Auto Scaling Groups, advanced cost optimization practices
Frameworks and Libraries
Essential: Docker, Kubernetes/EKS or ECS
Preferred: Reliability patterns such as circuit breakers, retries, backoff, health checks
Development Tools and Methodologies
Essential: CI/CD, Terraform or CloudFormation, monitoring and alerting, incident response, Linux troubleshooting
Preferred: Blue/green and canary deployments, release engineering improvements
Security Protocols
Essential: Least-privilege IAM, SSL/TLS, secrets handling, vulnerability awareness
Preferred: Automated scanning, policy enforcement, and remediation workflows
Experience Requirements
7+ years of experience in SRE, DevOps, or Cloud Operations
Experience owning production infrastructure and reliability outcomes
Strong experience with AWS, Docker, orchestration, CI/CD, IaC, and incident response
Experience improving MTTR, availability, and operational efficiency
Equivalent experience in related production engineering roles will also be considered
Day-to-Day Activities
Maintain AWS environments and containerized services
Monitor system health, alerts, logs, and traces
Improve deployment pipelines and release reliability
Participate in incident response, troubleshooting, and postmortems
Update runbooks, dashboards, and automation scripts
Work with Dev, QA, and Security teams on resilience and operational readiness
Join standups, planning sessions, reviews, and reliability discussions
Qualifications
Required
Bachelor’s degree in Computer Science, Engineering, Information Technology, or related field
or equivalent practical experience
Preferred
AWS, Kubernetes, Terraform, or cloud operations certifications
Ongoing learning in reliability engineering, security, and performance optimization
Professional Competencies
Strong analytical and problem-solving skills
Clear communication and effective documentation
Collaboration across engineering, QA, and security teams
Ability to prioritize operational work and planned improvements
Adaptability in production and incident-driven environments
Focus on reliability, efficiency, and continuous improvement
SYNECHRON’S DIVERSITY & INCLUSION STATEMENT
Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.
All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.


