
gratitude11111116
WebsiteSRE Site Reliability Engineer
Job description
Title
Site Reliability Engineer
Description
Job title: Site Realibility Engineer
Work set up: Hybrid: 2 WFH & 3 RTO (Location: Manila ( Quezon Avenue, Quezon City))
Work shift: Night Shift
Start date: ASAP
Headcount:1
Qualifications
- Bachelor’s degree in IT /Computer science/Engineering, or related field.
- 3+ years in monitoring/observability/SRE roles with hands‑on experience in Azure Monitor/App Insights (KQL) and ServiceNow Event Management.
- Strong knowledge in Azure Log Analytics, KQL, Telemetry, APM implementations
- Demonstrated ability to collaborate across IT Operations team, platform, cyber, network, and product teams, strong written verbal communication for standards and enablement.
- 5+ years of experience with SRE role and deep understanding of monitoring and application performance management
- Knowledge of SLO platforms (e.g., Nobl9) and experience contributing to standards/governance artifacts.
- Knowledge of proactive monitoring using Azure monitor services, telemetry, and synthetic transactions.
- Understanding of network architecture and security: WAN/LAN, TCP/IP, PKI.
- Familiarity with ITSM processes and tools (e.g., ServiceNow), and compliance processes
- Have AIOps vision and awareness
- Not a job hopper
Responsibilities
•You will design and define standards, patterns, and automations opportunities that elevate monitoring and reliability across platforms and applications, with a strong focus on Azure Monitor, ServiceNow ITOM Event Management, Grafana, and APM/Synthetics tooling
•You’ll partner with product teams to implement SLO/SLI‑driven operations, reduce alert noise, accelerate incident response, and embed self‑healing where it matters most.
•Engineer enterprise monitoring & event patterns by authoring and maintaining reference architectures, runbooks, and event management models (alert → event → incident) with actionable alerts and incidents routing.
•Contribute to Monitoring and Observability & Event Management Strategy and tooling intake/governance checkpoints and coach product teams
•Excellent communication skills to drive continuous improvement by reducing alert noise, shorten MTTR, and improve change success by embedding postmortem learnings into patterns, rules, and pipelines.
Must have Skills
•Cloud Observability: Azure Monitor/App Insights/Log Analytics (KQL)
- Knowledge of Grafana, Prometheus, App Dynamics, ThousandEyes
- Communication & Teaming – Able to translate complex reliability patterns into consumable standards and coach IT operations team via office hours/CoP sessions.
- Technical Depth in Monitoring and Observability Stack – Hands‑on in ServiceNow Event Management, Azure Monitor/KQL, and automation.
- Analytical & Systems Thinking – Uses SLI/SLOs, postmortems, and CMDB context to reduce noise, drive self‑healing, and measurably improve MTTR and KPIs.
Recruitment Process: Paper screening (endorsing profile to Operations team to identify if candidate meets the basic requirement/qualification for the position)
L1 Interview (Interview process with Practice/Operations Team)
L2 Interview (Optional)
Technical assessment/Final Interview (Conducted by Customer's Operations team)
Pre-screening Notes
- Highest Educational Attainment?
- How many years of relevant experience do you have monitoring/observability/SRE roles with hands‑on experience in Azure Monitor/App Insights (KQL) and ServiceNow Event Management?
- How many years of relevant experience do you have with SRE role and deep understanding of monitoring and application performance management?
- How much is your last drawn salary?
- How much is your salary expectation?
- Are you amenable to work Hybrid in Quezon City with a Night Shift schedules?
- When are you available to start once hired?


