Related skills
azure scripting aws python gcp๐ Description
- Monitor and improve reliability of production AI systems and workflow infra
- Identify, diagnose, and fix issues across apps, integrations, and cloud
- Own incident response incl. root cause analysis and long-term remediation
- Implement monitoring, alerting, and observability to ensure uptime
- Collaborate with Engineering to harden deployments and scale systems
- Support customer-facing teams by troubleshooting live environment issues
- Document configurations, operational procedures, and recovery protocols
- Continuously improve reliability standards, deployment practices, and safeguards
๐ฏ Requirements
- 3+ years of experience in support engineering, site reliability engineering, or infrastructure maintenance
- Strong proficiency in Python or scripting languages
- Experience managing cloud infrastructure (AWS, GCP, or Azure)
- Strong problem-solving skills and a proactive, preventative mindset
- Clear communication skills and ability to collaborate across engineering and customer-facing teams
- High ownership and accountability in high-reliability environments
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!