MCPNew: Mokaru MCP server is live
Vosker

Vosker

Site Reliability Engineer - Defendec/Reconeyez

Company

Vosker

Role

Site Reliability Engineer - Defendec/Reconeyez

Job type

Full-time

Found on Mokaru

🔥Recently

Share this job

Salary

Not disclosed by employer

Job description

We’re looking for a Site Reliability Engineer to join our DevOps team in Tallinn and take ownership of keeping the Reconeyez platform healthy and available. You’ll monitor our infrastructure, respond to incidents during on-call shifts, diagnose issues across the stack, and continuously improve our operational posture, while building and automating the tools and processes that keep our systems running reliably.

This is a hands-on role. You’ll spend your time in dashboards, terminals, and log files. When something breaks, you’re the one who finds out why and makes sure it doesn’t happen again.

What You’ll Do

Platform Reliability & Incident Response

  • Keep production running, monitor system health, respond to alerts, and resolve incidents before they impact customers
  • Participate in on-call rotation with the DevOps team, taking responsibility for incident response and resolution during your shifts
  • Document runbooks and incident postmortems so the team learns from every outage
  • Collaborate with development teams to improve reliability, flag recurring issues, and advocate for operational improvements

Infrastructure & Automation

  • Build and set up new development tools and infrastructure; deploy updates and fixes
  • Work on ways to automate and improve development and release processes using Git-based workflows and PR-based operations
  • Manage containerized services running on Docker/Podman,deployments, restarts, resource management, and health checks
  • Configure and maintain network services including firewalls, load balancers, and VPNs
  • Contribute to infrastructure-as-code practices to make infrastructure changes auditable and repeatable

Observability & Monitoring

  • Manage and improve monitoring using Zabbix, Grafana, Prometheus, and Alertmanager, build dashboards, tune alerts, reduce noise
  • Adopt and extend OpenTelemetry instrumentation across services for unified tracing, metrics, and logging
  • Analyze logs to identify root causes, spot patterns, and catch problems early
  • Monitor AI/ML inference endpoints and model-serving infrastructure, track latency, throughput, and model health alongside traditional service metrics
  • Monitor and flag infrastructure cost anomalies to support cloud spend awareness across the team

Databases & Security

  • Install, monitor, and maintain PostgreSQL, backups, recovery, performance tuning, and query troubleshooting
  • Ensure systems are safe and secure against cybersecurity threats, including container image scanning and supply chain security practices
  • Ensure systems are safe and secure against cybersecurity threats

Platform Engineering

  • Reduce cognitive load for development teams through tooling, automation, and self-service capabilities
  • Build internal tools and processes that help developers move faster without sacrificing reliability

Must have

  • Solid experience with Linux systems administration,comfortable in a terminal and able to navigate a production system under pressure
  • Hands-on experience with Docker and/or Podman for managing containerised services
  • Working knowledge of Grafana, Zabbix, and/or Prometheus for monitoring and alerting
  • Familiarity with OpenTelemetry as a modern observability standard
  • Experience with log analysis and troubleshooting, reading logs, correlating events, tracing issues across services
  • Knowledge of systems and platform security, including secrets management and access control
  • Comfortable with Git-based workflows and PR-based infrastructure changes
  • Willingness to be on-call, you understand the responsibility and can respond effectively during off-hours
  • Calm under pressure, incidents happen; you stay focused, communicate clearly, and fix things methodically
  • Independent problem solver, when an alert fires at 2 AM, you can diagnose and act without someone guiding you
  • Strong communicator and team player, able to work closely with colleagues in Tallinn
  • Fluency in English and Estonian

Nice to Have

  • Experience with Elasticsearch or similar log aggregation and search platforms
  • Experience administering PostgreSQL, backups, performance tuning, query troubleshooting
  • Familiarity with infrastructure-as-code tools such as Ansible, Salt, or Terraform
  • Networking fundamentals,DNS, firewalls, load balancers, VPNs
  • Exposure to AI/ML infrastructure, model serving, inference endpoints
  • Experience with supply chain security practices, container image scanning, dependency auditing
  • Experience with incident management processes and tooling
  • Degree in Computer Science, Engineering, or a related field, or equivalent practical experience

Level

Mid-level (2–5+ years in operations, DevOps, or systems administration). We value reliability instincts and troubleshooting depth over breadth across cloud platforms.

Other Details

  • Reports to the DevOps team manager in Tallinn
  • New role with immediate start
  • Career progression opportunities as the company grows
  • Competitive salary, discussed and agreed based on qualifications and experience

 

Resume ExampleCover Letter Example

Explore more