Capacity Ops Engineer

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

python kubernetes go multi-cloud gpu

šŸ“‹ Description

  • Lead Specialized Pods: Act as the lead for specific GPU pods (e.g., H100 or B200), managing the full lifecycle of acquisition, air traffic control, and maintenance for those assets.
  • Advanced Orchestration: Execute complex workload migrations and sticky deployment drains, ensuring deployment scheduling rules meet strict regional and compliance requirements.
  • Build for Scalability: Design and implement the next version of Baseten’s capacity management system to handle a 10x increase in GPU volume.
  • Financial Modeling: Leverage your understanding of unit economics to build ROI models for GPU spend, ensuring Baseten scales profitably.
  • Cross-Team Collaboration: Partner with SRE, Infra, and FDE teams to take discrete operational tasks off their plate and verify last mile follow-through on infrastructure changes.
  • Incident Response: Lead capacity-crunch response by rapidly untainting and re-coordinating workloads during high-pressure outages.

šŸŽÆ Requirements

  • Bachelor’s, Master’s, or Ph.D. in CS, Engineering, Mathematics, or a related field
  • 5+ years in a high-growth environment, preferably at a hyperscaler or GPU provider
  • Deep Kubernetes expertise: taints, cordons, draining, and custom operators
  • Go or Python in production; model ROI for capacity reliability and cost
  • High tenacity and collaborative mindset

šŸŽ Benefits

  • Competitive compensation and equity
  • 100% medical, dental, and vision for you and dependents
  • Generous PTO including Winter Break
  • Paid parental leave
  • 401(k) plan
  • Exposure to ML startups and learning opportunities
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →