Related skills
datadog bash aws python kubernetesπ Description
- Monitor GTO dashboard in Datadog; detect anomalies across signals.
- Create incident tickets in Jira Service Management; triage issues.
- Own low-severity incidents end-to-end; resolve with runbooks.
- Support the TSO Lead during major incidents; surface real-time data.
- Analyze incident trends; compile reports for product and engineering.
- Build automation and runbooks; enable repeatable resolutions.
π― Requirements
- 4+ years in SRE/DevOps/production operations in high-availability env.
- Strong troubleshooting; trace issues across stack: logs, APM, infra, DB.
- Hands-on Datadog; navigate APM, logs, dashboards, monitors, SLOs.
- Proficiency in Python, Go, or Bash for automation.
- Clear written and verbal English communication.
- Kubernetes and cloud infra (GCP preferred; AWS/Azure acceptable).
π Benefits
- Medical, dental, and vision coverage; PTO
- Career roadmap and professional development opportunities
- Supportive team culture and inclusive environment
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Operations Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!