Site Reliability Engineer II

Added
9 days ago
Type
Full time
Salary
Salary not provided

Related skills

azure terraform grafana prometheus opentelemetry

๐Ÿ“‹ Description

  • Design and implement monitoring strategies across systems.
  • Collaborate on Grafana, Prometheus/Loki, Azure Monitor.
  • Define golden signals, SLOs/SLIs, and actionable alerting.
  • Drive logging, tracing, and OpenTelemetry standards for teams.
  • Lead production incident response and remediation.
  • Conduct blameless post-incident reviews and follow-ups.

๐ŸŽฏ Requirements

  • 5+ years in SRE/Production Engineering or related roles.
  • Strong knowledge of cloud-native systems, preferably Azure.
  • Experience with observability tooling (Grafana/Prometheus/Loki/Azure Monitor).
  • Understanding of DR concepts, failover validation, and readiness.
  • Strong grasp of SRE principles: SLOs/SLIs, error budgets, toil.
  • Strong collaboration and communication skills.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’