Principal Site Reliability Engineer (Platform Tribe)

Added
24 days ago
Type
Full time
Salary
Salary not provided

Related skills

docker terraform aws grafana prometheus

๐Ÿ“‹ Description

  • Manage day-to-day alerts, system checks, and issue escalation.
  • Provide 24x7 on-call support for critical SaaS events.
  • Document issues and remediation steps.
  • Proactively create monitors within the EKS/K8s ecosystem.
  • Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
  • Enhance infrastructure health with checks and scripts.

๐ŸŽฏ Requirements

  • Kubernetes deployment, scaling, troubleshooting.
  • FluxCD/ArgoCD configuration management.
  • RCA and Postmortems experience.
  • AWS, Terraform, Docker, CI/CD.
  • Monitoring: DataDog, Prometheus, Grafana.
  • Logging: Elasticsearch/Logstash/Kibana or AWS CloudWatch.

๐ŸŽ Benefits

  • Competitive salary and annual reviews.
  • Bonus system (15-20%), paid quarterly.
  • Unlimited vacation and paid sick leave.
  • Flexible work schedule.
  • 100% Remote.
  • Financial support for life events and extended parental leave.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’