Related skills
datadog aws prometheus python kubernetesπ Description
- Own and evolve observability strategy: monitoring, alerting, dashboards, logging, tracing.
- Define and manage SLIs, SLOs, and reliability metrics.
- Lead incident response, postmortems, and continuous improvement.
- Improve MTTD and MTTR through automation and operational excellence.
- Integrate observability into CI/CD pipelines and software delivery workflows.
- Build and maintain reliable cloud infrastructure on AWS and Kubernetes.
π― Requirements
- 8+ years in software engineering, infrastructure, or operations.
- 5+ years of Site Reliability Engineering experience.
- Deep expertise with observability platforms (New Relic, Datadog, Dynatrace, Grafana, Prometheus).
- Strong monitoring, alerting, incident management, and reliability engineering.
- Hands-on AWS, Kubernetes, and cloud-native tech.
- Python, Bash, PowerShell, or similar scripting languages; excellent communication.
π Benefits
- Medical, Dental, and Vision Insurance for full-time employees.
- Competitive pay.
- Maternity and paternity leave for full-time staff.
- Short and long-term disability.
- Opportunity to learn from a dedicated leadership team.
- Top-of-the-line company swag.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!