Staff Site Reliability Engineer

Added
11 days ago
Location
Type
Full time
Salary
Salary not provided

Related skills

datadog azure aws grafana prometheus

📋 Description

  • Architect reliability platforms and self-service tooling.
  • Lead AI-driven reliability via automation of diagnostics.
  • Champion reliability culture across engineering.
  • Incident leadership as Incident Commander during critical events.
  • Advance observability with end-to-end monitoring and tracing.
  • Mentor engineers across SRE and product teams.

🎯 Requirements

  • 8+ years in SRE/DevOps; 3+ in Senior+ SRE.
  • Production SaaS systems at scale.
  • Python, Go, or similar proficiency.
  • AWS, GCP, or Azure + Kubernetes.
  • Networking: TCP/IP, DNS, HTTP/S, load balancing.
  • Monitoring/alerting: Prometheus, Grafana, Datadog, ELK.
  • Advanced observability: OTEL, continuous profiling.
  • Incident management and postmortems.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →