Related skills
gitops terraform kubernetes webrtc llm๐ Description
- Design, build, and maintain AI inference infra with high throughput and low latency.
- Own end-to-end deployment pipelines for real-time vision and LLMs.
- Architect and scale GPU-enabled Kubernetes clusters with autoscaling.
- Build WebRTC-based infra for real-time AI agents (STT/TTS) at low latency.
- Drive inference scaling with speculative decoding, batching, and model parallelism.
- Implement Terraform and GitOps for GPU AI environments.
๐ฏ Requirements
- 5+ years infra engineering with 2+ years AI/ML in production.
- Strong Kubernetes for GPU workloads: scheduling, autoscaling.
- Hands-on with model serving and inference optimization for CV and LLM.
- Inference optimization: speculative decoding, batching, quantization, scaling.
- Experience provisioning infra for real-time AI systems including WebRTC clusters.
- Familiarity with real-time video CV inference pipelines and low-latency STT/TTS.
- IaC (Terraform or similar) and GitOps for GPU environments.
- Fluent in English.
๐ Benefits
- Stimulating, fast-paced environment with room for creativity.
- Competitive salary and growth at a high-tech startup.
- Flexible remote work with unlimited vacation.
- Health and well-being program (digital therapist).
- Remote-first company with flexible hours.
- Learn more about our tech stack at stackshare.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!