Related skills
python kubernetes tensorflow pytorch mlops📋 Description
- Build and operate data infrastructure at scale
- Scale massive distributed compute and storage systems
- Architect multi-cluster orchestration across hardware and regions
- Design future-proof storage for exabyte-scale data
- Develop internal training platform on Kubernetes/SLURM
- Implement production-grade data pipelines and on-call rotations
🎯 Requirements
- 4+ years in Data Infrastructure, MLOps, or infra Eng
- Proficient in Python; build scalable data pipelines
- Kubernetes-native tooling and multi-cluster orchestration
- Distributed training experience; ML platforms (DeepSpeed/FSDP/SLURM/K8s)
- Familiar with PyTorch, JAX or TensorFlow; CUDA a plus
- Strong software design, testing, CI/CD; collaborative
🎁 Benefits
- Competitive salary and equity
- Healthcare for you and family
- 401K with 6% matching
- PTO 18 days
- Visa sponsorship
- BetterUp coaching
🛃 Visa sponsorship
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!