Staff Site Reliability Engineer

Added
5 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

saas python kubernetes 8+ years in sre/devops with 3+ years in senior+ sre tcp/ip dns http/s load balancing

πŸ“‹ Description

  • Architect reliability platforms and lead cross-team incident response.
  • Drive 99.99% uptime via scalable platforms and processes.
  • Build self-service tooling to empower teams to own reliability.
  • Lead AI-driven reliability with automated diagnostics and remediation.
  • Champion SRE culture through design reviews and production readiness.
  • Incident leadership: run incidents and postmortems for lasting improvements.

🎯 Requirements

  • 8+ years in SRE/DevOps with 3+ years in Senior+ SRE.
  • Strong background running production SaaS at scale.
  • Proficiency in Python, Go, or similar.
  • Hands-on with AWS, GCP, or Azure and Kubernetes.
  • Deep networking fundamentals: TCP/IP, DNS, HTTP/S, load bal.
  • Experience with monitoring/alerting: Prometheus, Grafana, Datadog, ELK.
  • Familiarity with OTEL and continuous profiling.
  • Proven incident management experience and postmortems.

🎁 Benefits

  • Equity may be offered.
  • Generous benefits program.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’