Related skills
aws python kubernetes go langchain๐ Description
- Own and define the long-term reliability strategy and architecture.
- Design planet-scale, highly resilient systems on AWS and Kubernetes.
- Lead autonomous operations platforms powered by AI agents.
- Architect and implement LLM-driven SRE systems using OpenAI API.
- Establish gold standards for SRE: SLOs, SLAs, incident mgmt.
- Drive observability architecture at scale: metrics, logs, traces.
๐ฏ Requirements
- 10+ years in SRE/Platform/Distributed Systems.
- Designed and operated large-scale distributed systems.
- AWS at scale, Kubernetes internals, reliability-focused.
- Strong Python/Go programming for platforms/tools.
- Led cross-functional technical initiatives.
- Experience integrating LLMs into production (OpenAI API) and AI automation.
๐ Benefits
- Tremendous growth and learning opportunities.
- Challenging, rewarding work with impact.
- Welcoming, positive work environment.
- Equal opportunity employer; inclusive.
- Work with AI-native platform and leading brands.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!