Related skills
datadog terraform grafana prometheus pythonπ Description
- Own reliability, availability, and performance of production systems; define SLOs.
- Design and build autonomous AI agents for monitoring distributed systems.
- Lead on-call incidents; drive blameless postmortems and incident response.
- Build automation and Infrastructure-as-Code with Terraform.
- Improve observability with Datadog, Prometheus, Grafana, OpenTelemetry.
- Collaborate with teams and mentor junior engineers.
π― Requirements
- 5+ years in SRE/DevOps/Platform Engineering.
- Cloud experience with Azure (preferred), AWS, or GCP.
- Strong Linux, networking, and distributed systems troubleshooting.
- Kubernetes/EKS/AKS containers orchestration; Terraform IaC.
- Python, Go, Bash, or C#/.NET scripting.
- Observability with Datadog, Prometheus, Grafana, or OpenTelemetry.
π Benefits
- Flexible vacation
- 12 company-paid holidays
- 10 paid sick days
- Health, dental, and vision insurance
- 401(k) with matching
- Equity
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!