Added
3 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

datadog terraform aws grafana prometheus

πŸ“‹ Description

  • Build self-service platform infrastructure for product teams.
  • Automate repetitive manual tasks to reduce toil.
  • Implement monitoring, alerts, and dashboards for reliability.
  • On-call rotation to respond to incidents with resilience.
  • Plan capacity and optimize performance for scale.
  • Collaborate with security, product engineering, and SRE teams.

🎯 Requirements

  • 5+ years distributed systems and microservices in production.
  • Strong AWS experience (EC2, ECS/EKS, VPC, IAM) and multi-AZ.
  • Terraform or CloudFormation fluency; think in code, not clickops.
  • Go or Python for automation tooling.
  • Kubernetes multi-tenancy production experience in multi-tenant clusters.
  • Observability with Prometheus, Grafana, Datadog.
  • Incident response experience: on-call, outages, postmortems.
  • Security-minded approach: least-privilege, encryption, threat models.

🎁 Benefits

  • Autonomy and ownership over architectural decisions.
  • Modern stack with current tooling.
  • Sustainable on-call with fair rotation.
  • Collaborative culture with design reviews and knowledge sharing.
  • Remote-first with async communication.
  • Travel for in-person engagement may be required.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’