Added
4 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog bash aws python kubernetes

πŸ“‹ Description

  • Monitor GTO dashboard in Datadog; detect anomalies across signals.
  • Create incident tickets in Jira Service Management; triage issues.
  • Own low-severity incidents end-to-end; resolve with runbooks.
  • Support the TSO Lead during major incidents; surface real-time data.
  • Analyze incident trends; compile reports for product and engineering.
  • Build automation and runbooks; enable repeatable resolutions.

🎯 Requirements

  • 4+ years in SRE/DevOps/production operations in high-availability env.
  • Strong troubleshooting; trace issues across stack: logs, APM, infra, DB.
  • Hands-on Datadog; navigate APM, logs, dashboards, monitors, SLOs.
  • Proficiency in Python, Go, or Bash for automation.
  • Clear written and verbal English communication.
  • Kubernetes and cloud infra (GCP preferred; AWS/Azure acceptable).

🎁 Benefits

  • Medical, dental, and vision coverage; PTO
  • Career roadmap and professional development opportunities
  • Supportive team culture and inclusive environment
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Operations Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Operations Jobs

See more Operations jobs β†’