Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
python kubernetes go multi-cloud gpuš Description
- Lead Specialized Pods: Act as the lead for specific GPU pods (e.g., H100 or B200), managing the full lifecycle of acquisition, air traffic control, and maintenance for those assets.
- Advanced Orchestration: Execute complex workload migrations and sticky deployment drains, ensuring deployment scheduling rules meet strict regional and compliance requirements.
- Build for Scalability: Design and implement the next version of Basetenās capacity management system to handle a 10x increase in GPU volume.
- Financial Modeling: Leverage your understanding of unit economics to build ROI models for GPU spend, ensuring Baseten scales profitably.
- Cross-Team Collaboration: Partner with SRE, Infra, and FDE teams to take discrete operational tasks off their plate and verify last mile follow-through on infrastructure changes.
- Incident Response: Lead capacity-crunch response by rapidly untainting and re-coordinating workloads during high-pressure outages.
šÆ Requirements
- Bachelorās, Masterās, or Ph.D. in CS, Engineering, Mathematics, or a related field
- 5+ years in a high-growth environment, preferably at a hyperscaler or GPU provider
- Deep Kubernetes expertise: taints, cordons, draining, and custom operators
- Go or Python in production; model ROI for capacity reliability and cost
- High tenacity and collaborative mindset
š Benefits
- Competitive compensation and equity
- 100% medical, dental, and vision for you and dependents
- Generous PTO including Winter Break
- Paid parental leave
- 401(k) plan
- Exposure to ML startups and learning opportunities
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest ā finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!