Marscapital

Marscapital

Junior Site Reliability Engineer

Role

Junior Site Reliability Engineer

Job type

Full-time

Posted

1 week ago

Share this job

Salary

Not disclosed by employer

Job description

 

Role Overview

We are seeking a Junior Site Reliability Engineer (SRE) with foundational knowledge of AWS cloud infrastructure, containerised platforms, and CI/CD pipelines. The successful candidate will support the team in improving system reliability, availability, performance, and scalability while learning to deliver high-quality services efficiently.

This role provides an excellent opportunity to develop engineering and operational skills, with exposure to automation, observability, scalability, and resilience across cloud-native environments. As a junior engineer, you will work alongside senior engineers to learn and apply DevOps and SRE best practices, contributing to efforts that reduce operational toil and enhance system reliability.

Note: This is a reliability-focused engineering role with supported on-call responsibilities and involvement in platform modernisation initiatives.

Key Responsibilities

  • Assist in building, maintaining, and monitoring infrastructure on AWS under guidance from senior engineers.
  • Support the maintenance and improvement of CI/CD pipelines for automated build, test, and deployment processes.
  • Contribute to CI/CD workflows, including multi-stage pipelines and release processes.
  • Assist with the administration and support of Windows (IIS, .NET) and Linux-based production systems.
  • Support the deployment and maintenance of containerised applications using Docker and Kubernetes (EKS/AKS).
  • Assist in implementing Infrastructure as Code (IaC) using Terraform, CloudFormation, or ARM templates.
  • Write and maintain automation scripts using PowerShell, Bash, or Python.
  • Support the monitoring of SLIs, SLOs, and SLAs to help ensure system reliability.
  • Assist in configuring and maintaining monitoring, logging, and alerting solutions (CloudWatch, Prometheus, Grafana, Azure Monitor).
  • Participate in incident response and troubleshooting under the guidance of senior engineers, contributing to root cause analysis (RCA).
  • Support performance monitoring and capacity planning activities.
  • Work with development teams to understand and support deployment strategies (blue-green, canary releases).
  • Follow security, compliance, and best practices across CI/CD pipelines and infrastructure.

 

Required Skills & Experience

  • Up to 2 year of experience or relevant internship/placement in Site Reliability Engineering, DevOps, or Infrastructure Engineering (graduate applicants with strong academic projects welcome).
  • Basic understanding of AWS services (EC2, S3, RDS, VPC, IAM, ELB, CloudWatch).
  • Familiarity with CI/CD concepts and exposure to Azure DevOps Pipelines or similar tools.
  • Basic understanding of pipeline design and deployment strategies.
  • Foundational knowledge of Windows Server administration, including IIS and .NET applications.
  • Basic Linux system administration skills.
  • Awareness of containerisation concepts (Docker) and an interest in Kubernetes (EKS/AKS).
  • Exposure to Infrastructure as Code concepts (Terraform, CloudFormation, or ARM templates).
  • Basic scripting ability in at least one of PowerShell, Bash, or Python.
  • Familiarity with monitoring and logging concepts and tools (Prometheus, Grafana, ELK, CloudWatch).
  • Foundational understanding of networking, security, and cloud architecture principles.

Preferred Qualifications

  • A degree in Computer Science, Software Engineering, or a related discipline (or equivalent practical experience).
  • Any hands-on experience with cloud platforms (AWS, Azure, or GCP) through personal projects, labs, or coursework.
  • Familiarity with version control systems (Git).
  • Exposure to Helm, GitOps practices, or service mesh concepts.
  • Interest in obtaining relevant certifications (AWS, Kubernetes, Azure DevOps).

Key Competencies / Characteristics

  • Eager learner: Proactively seeks to build knowledge in SRE practices, cloud-native technologies, and automation.
  • Reliability-minded: Developing an appreciation for uptime, performance, and system resilience.
  • Automation-curious: Interested in reducing manual effort and finding efficient solutions through scripting and tooling.
  • Accountable: Takes responsibility for assigned tasks and follows through to completion.
  • Strong communicator: Willing to ask questions, share progress, and document findings clearly.
  • Collaborative: Works well within a team, supporting colleagues across platform, security, and application teams.

D&I statement

Resume ExampleCover Letter Example

Explore more