Member of Technical Staff, Inference

Added
4 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

terraform aws python kubernetes pytorch

📋 Description

  • Productionize model checkpoints end-to-end from research to production
  • Build and optimize large-scale multi-GPU inference systems
  • Design/implement diffusion-model serving infra for real-time workflows
  • Add monitoring/observability for new releases: errors, throughput, GPU, latency
  • Collaborate with research teams to collect data and support model development
  • Explore/integrate GPU inference providers (Modal, E2E, Baseten)

🎯 Requirements

  • 4+ years of ML model inference at scale in production
  • Strong experience with PyTorch and multi-GPU inference for large models
  • Kubernetes for ML workloads—deploying, scaling, debugging GPU-based services
  • Comfortable working across multiple cloud providers and managing GPU driver compatibility
  • Experience with monitoring/observability for ML systems (errors, throughput, GPU utilization)
  • Self-starter who can work embedded with research teams and move fast

🎁 Benefits

  • Salary range: $240,000-290,000 USD
  • Remote-friendly, globally distributed team
  • Opportunity to impact millions of users
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →