Site Reliability Engineer II Remote

Added
6 days ago
Type
Full time
Salary
Salary not provided

Related skills

sre azure terraform grafana prometheus

πŸ“‹ Description

  • Design and implement monitoring strategies.
  • Collaborate on Grafana, Prometheus/Loki, Azure Monitor.
  • Define golden signals, SLOs/SLIs, and alerting.
  • Drive logging, tracing, and OpenTelemetry standards.
  • Monitor production systems and ensure instrumentation.
  • Own production incident response and remediation.

🎯 Requirements

  • 5+ years in SRE/Production Engineering or related ops.
  • Strong knowledge of cloud-native systems, ideally Azure.
  • Experience with observability tooling (Grafana, Prometheus/Loki, Azure Monitor, App Insights).
  • Understanding of DR concepts, failover validation, and operational readiness.
  • Familiarity with chaos engineering practices (nice-to-have).
  • Strong SRE principles: SLOs/SLIs, error budgets, toil reduction, postmortems.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’