Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog terraform aws grafana python

πŸ“‹ Description

  • Design automated reliability and self-healing systems to protect production.
  • Own and improve incident management tooling and on-call health.
  • Develop observability infrastructure: monitoring, alerts, SLOs, latency visibility.
  • Contribute to AI-driven tooling for autonomous remediation.
  • Drive incident prevention by eliminating operational toil.
  • Partner with product engineering teams to diagnose reliability gaps.

🎯 Requirements

  • 8+ years designing and building software in teams.
  • Bachelor's degree in CS/Engineering or equivalent.
  • 3+ years in infrastructure and/or platform engineering.
  • Expertise in observability, reliability, and data analysis.
  • Experience with Datadog, New Relic, or Grafana.
  • Familiarity with AWS or GCP and IaC (Terraform).

🎁 Benefits

  • Flexible, employee-led remote model with optional in-person offices.
  • Professional development stipend and growth opportunities.
  • Comprehensive health and parental leave plans.
  • Equity and performance-based rewards in a high-growth company.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’