Related skills
python kubernetes go cloud networking load balancing📋 Description
- Participate in 24/7 on-call rotations for core infrastructure systems
- Execute incident response during production events, including triage and recovery
- Improve runbooks, operational procedures, and escalation paths
- Improve reliability of core infra (Kubernetes/GKE, cloud networking, edge)
- Automate repetitive operational and security tasks
🎯 Requirements
- 4+ years of experience operating large-scale systems
- Experience leading incident response or reliability initiatives
- Ability to identify systemic issues and propose long-term fixes
- Comfortable mentoring junior engineers and influencing peers
- Experience supporting or operating production systems
- Strong troubleshooting and communication skills
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!