Site reliability engineer

Added
3 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

docker terraform aws grafana prometheus

πŸ“‹ Description

  • Automate tasks and infra with Python/Go to reduce toil
  • Design scalable fault-tolerant infra on AWS/GCP/Azure
  • Own reliability, performance, and SLOs for core services
  • Own observability stack for monitoring, logging, alerts
  • Lead incident response, post-mortems, root cause analyses
  • Collaborate with product/engineering on reliable system design

🎯 Requirements

  • 7+ years in SRE/DevOps for high-availability systems
  • Cloud expertise (AWS preferred), Docker/Kubernetes, Terraform
  • Python/Java/Go for automation and monitoring
  • Prometheus, Grafana, ELK Stack for monitoring/logging
  • Challenge the status quo; propose reliability improvements
  • Ownership and accountability for mission-critical systems

🎁 Benefits

  • Generous PTO plus company holidays
  • Medical, dental, and vision coverage for you and family
  • Paid parental leave (12 weeks)
  • Fertility and family planning support
  • HSA with company contribution
  • 401k with company stock options
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’