Related skills
terraform python kubernetes go cdk📋 Description
- Design and operate GPU infrastructure for model hosting and scheduling.
- Build and scale model serving with vLLM, TensorRT-LLM, Triton for real-time inference.
- Implement multi-model routing across modalities on shared infrastructure.
- Own end-to-end model lifecycle: download, deploy, serve, monitor, scale.
- Drive inference optimization: quantization, batching, caching, cold-start reduction.
- Build self-service platforms to provision compute, storage, and model endpoints via APIs.
🎯 Requirements
- 8+ years software engineering; 3+ years building infra platforms or ML/AI infra.
- Deep experience with AWS, GCP, and Kubernetes.
- Hands-on with GPU workloads and model serving (vLLM, TensorRT-LLM, Triton).
- Proficiency in Python, Go, or C++.
- IaC experience: Terraform, Pulumi, CDK.
- Experience leading cross-team technical initiatives and influencing direction.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!