Site Reliability Engineer

Added
13 days ago
Type
Full time
Salary
Salary not provided

Related skills

javascript java linux aws python

πŸ“‹ Description

  • Improve reliability, fault tolerance, scalability, and performance
  • Manage incidents; involve teams and automate away manual practices
  • Respond to automated alerts via on-call rotation
  • Define and maintain SLIs/SLOs/SLA and error budgets
  • Improve observability (metrics/logs/tracing) to speed detection
  • Lead postmortems and follow-ups to reduce repeats

🎯 Requirements

  • Experience designing/operating scalable, reliable systems in AWS or similar
  • Handled on-call shifts for critical systems
  • Experienced with chaos engineering (Gremlin)
  • Able to debug live production systems
  • Enjoy writing/deploying code with no downtime
  • Experience scripting and/or development (Linux Shell, Python, JavaScript, Java)

🎁 Benefits

  • Customer first mindset and transparency to users
  • Collaborative, dynamic team culture that perseveres
  • Commitment to continuous improvement and experimentation
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’