Related skills
datadog bash aws python kubernetesπ Description
- Monitor the GTO dashboard in Datadog; detect anomalies across signals.
- Triage and investigate incidents; create Jira tickets; perform checks.
- Own low-severity incidents end-to-end; diagnose and resolve.
- Support TSO Lead during major incidents; surface real-time data.
- Draft incident communications and publish status updates (status.xsolla.com).
- Analyze incidents in non-incident periods; identify trends for teams.
π― Requirements
- 4+ years in SRE/DevOps/production ops in high-availability environments.
- Troubleshoot across logs, APM, infra metrics, DB.
- Datadog (or equivalent) experience; build queries, dashboards, alerts.
- Scripting in Python/Go/Bash for automation.
- Kubernetes and cloud infra (GCP preferred; AWS/Azure acceptable).
- Clear English communication for incident updates and handoffs.
π Benefits
- Flexible hours and no dress code; comfortable office environment.
- Latest Mac workstation and hardware.
- Access to Google tools (Chat, Gmail, Drive) and Confluence/Jira/GitLab.
- Professional growth: trainings and conference opportunities.
- Health insurance for employee and dependents.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!