Related skills
datadog azure aws grafana prometheus📋 Description
- Architect reliability platforms and self-service tooling.
- Lead AI-driven reliability via automation of diagnostics.
- Champion reliability culture across engineering.
- Incident leadership as Incident Commander during critical events.
- Advance observability with end-to-end monitoring and tracing.
- Mentor engineers across SRE and product teams.
🎯 Requirements
- 8+ years in SRE/DevOps; 3+ in Senior+ SRE.
- Production SaaS systems at scale.
- Python, Go, or similar proficiency.
- AWS, GCP, or Azure + Kubernetes.
- Networking: TCP/IP, DNS, HTTP/S, load balancing.
- Monitoring/alerting: Prometheus, Grafana, Datadog, ELK.
- Advanced observability: OTEL, continuous profiling.
- Incident management and postmortems.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!