Added
19 hours ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog bash aws python kubernetes

πŸ“‹ Description

  • Monitor the GTO dashboard in Datadog; detect anomalies across signals.
  • Triage and investigate incidents; create Jira tickets; perform checks.
  • Own low-severity incidents end-to-end; diagnose and resolve.
  • Support TSO Lead during major incidents; surface real-time data.
  • Draft incident communications and publish status updates (status.xsolla.com).
  • Analyze incidents in non-incident periods; identify trends for teams.

🎯 Requirements

  • 4+ years in SRE/DevOps/production ops in high-availability environments.
  • Troubleshoot across logs, APM, infra metrics, DB.
  • Datadog (or equivalent) experience; build queries, dashboards, alerts.
  • Scripting in Python/Go/Bash for automation.
  • Kubernetes and cloud infra (GCP preferred; AWS/Azure acceptable).
  • Clear English communication for incident updates and handoffs.

🎁 Benefits

  • Flexible hours and no dress code; comfortable office environment.
  • Latest Mac workstation and hardware.
  • Access to Google tools (Chat, Gmail, Drive) and Confluence/Jira/GitLab.
  • Professional growth: trainings and conference opportunities.
  • Health insurance for employee and dependents.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’