Related skills
saas python kubernetes 8+ years in sre/devops with 3+ years in senior+ sre tcp/ip dns http/s load balancingπ Description
- Architect reliability platforms and lead cross-team incident response.
- Drive 99.99% uptime via scalable platforms and processes.
- Build self-service tooling to empower teams to own reliability.
- Lead AI-driven reliability with automated diagnostics and remediation.
- Champion SRE culture through design reviews and production readiness.
- Incident leadership: run incidents and postmortems for lasting improvements.
π― Requirements
- 8+ years in SRE/DevOps with 3+ years in Senior+ SRE.
- Strong background running production SaaS at scale.
- Proficiency in Python, Go, or similar.
- Hands-on with AWS, GCP, or Azure and Kubernetes.
- Deep networking fundamentals: TCP/IP, DNS, HTTP/S, load bal.
- Experience with monitoring/alerting: Prometheus, Grafana, Datadog, ELK.
- Familiarity with OTEL and continuous profiling.
- Proven incident management experience and postmortems.
π Benefits
- Equity may be offered.
- Generous benefits program.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!