Related skills
react terraform aws python kubernetesπ Description
- Own fleet reliability for Portal's SaaS infra, incl. LLM workflows.
- Define SLOs and capacity plans to scale our product.
- Architect infra on GCP and AWS using Terraform for AI workloads.
- Drive incident management, on-call, postmortems; enable self-healing.
- Lead fullstack reliability across TypeScript, React, Python.
- Mentor engineers and shape infra roadmap with AI features.
π― Requirements
- 5+ years operating cloud infra (GCP/AWS), with Terraform and Kubernetes.
- Experience with LLM-based systems, RAG pipelines, or agentic workloads.
- Distributed systems: consistency, availability, partition tolerance.
- Proficient in TypeScript, Java, Go, or Python; navigate large codebases incl. AI PRs.
- Build automation to prevent operational issues.
- Clear communicator; write postmortems that drive change.
π Benefits
- Health insurance
- Six-month paid parental leave
- 401(k) retirement plan
- Monthly meal allowance
- 23 paid days off
- Paid holidays and sick leave
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!