io-tech-solutions-limited
Devops/Sre
Job description
About the Role
We are seeking a skilled and motivated DevOps / Site Reliability Engineer (SRE) with 2+ years of experience to help us build, scale, and maintain robust, secure, and high-availability infrastructure. As a DevOps/SRE team member, you will work closely with development, QA, and operations teams to automate processes, monitor system health, and ensure the reliability of our services.
This is a hands-on role that requires strong technical skills, a deep understanding of modern DevOps tools and practices, and a problem-solving mindset.
Key Responsibilities
- Design, implement, and maintain CI/CD pipelines for reliable code deployment
- Monitor application performance and system reliability using tools like Prometheus , Grafana , or Datadog
- Maintain and improve cloud infrastructure (e.g., AWS, GCP, Azure) following best practices
- Manage infrastructure as code using tools such as Terraform , Ansible , or CloudFormation
- Troubleshoot infrastructure and application issues, ensuring minimal downtime and fast resolution
- Automate repetitive operational tasks and improve development workflows
- Implement and enforce security, backup, and disaster recovery strategies
- Participate in on-call rotation and respond to incidents with root cause analysis and postmortem reviews
- Work closely with development teams to ensure applications are designed for performance, availability, and scalability
- Optimize resource usage and costs across cloud environments
Qualifications
Required
- Bachelors degree in Computer Science, Engineering, or a related field
- 2+ years of experience in a DevOps , SRE , or Systems Engineering role
- Hands-on experience with Linux/Unix system administration
- Experience with CI/CD tools such as Jenkins , GitHub Actions , CircleCI , or GitLab CI
- Working knowledge of cloud platforms (AWS, GCP, Azure)
- Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)
- Experience with infrastructure as code tools like Terraform, Ansible, or similar
- Proficient in at least one scripting or programming language (e.g., Bash, Python, Go)
- Strong understanding of monitoring, logging , and alerting systems
- Version control with Git
Preferred
- Experience with Kubernetes administration in production environments
- Familiarity with security best practices and compliance standards
- Understanding of networking , load balancing , and DNS configurations
- Exposure to incident management and SLA/SLO/SLI concepts
- Experience working in Agile environments


