Related skills
terraform aws python kubernetes pytorch📋 Description
- Productionize model checkpoints end-to-end from research to production
- Build and optimize large-scale multi-GPU inference systems
- Design/implement diffusion-model serving infra for real-time workflows
- Add monitoring/observability for new releases: errors, throughput, GPU, latency
- Collaborate with research teams to collect data and support model development
- Explore/integrate GPU inference providers (Modal, E2E, Baseten)
🎯 Requirements
- 4+ years of ML model inference at scale in production
- Strong experience with PyTorch and multi-GPU inference for large models
- Kubernetes for ML workloads—deploying, scaling, debugging GPU-based services
- Comfortable working across multiple cloud providers and managing GPU driver compatibility
- Experience with monitoring/observability for ML systems (errors, throughput, GPU utilization)
- Self-starter who can work embedded with research teams and move fast
🎁 Benefits
- Salary range: $240,000-290,000 USD
- Remote-friendly, globally distributed team
- Opportunity to impact millions of users
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!