Related skills
javascript java linux aws pythonπ Description
- Improve reliability, fault tolerance, scalability, and performance
- Manage incidents; involve teams and automate away manual practices
- Respond to automated alerts via on-call rotation
- Define and maintain SLIs/SLOs/SLA and error budgets
- Improve observability (metrics/logs/tracing) to speed detection
- Lead postmortems and follow-ups to reduce repeats
π― Requirements
- Experience designing/operating scalable, reliable systems in AWS or similar
- Handled on-call shifts for critical systems
- Experienced with chaos engineering (Gremlin)
- Able to debug live production systems
- Enjoy writing/deploying code with no downtime
- Experience scripting and/or development (Linux Shell, Python, JavaScript, Java)
π Benefits
- Customer first mindset and transparency to users
- Collaborative, dynamic team culture that perseveres
- Commitment to continuous improvement and experimentation
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!