Related skills
docker python kubernetes pytorch deepspeedπ Description
- Create flexible, performant ML infrastructure
- Design ML cloud infra for massive-scale modeling and analytics
- Support model exploration, hyperparameter optimization, pretraining and fine-tuning
- Build scalable distributed training pipelines (sharding, cross-GPU)
- Create, operate, and maintain ML platforms across the model lifecycle
- Make architecture decisions balancing performance, cost, reliability, and scalability
π― Requirements
- Bachelor's degree in Computer Science, Electrical Engineering, or related
- 5+ years in software engineering, large-scale data infrastructure, or systems ML
- Extensive proficiency in Python
- Familiarity with PyTorch
- Experience with distributed-training frameworks (FSDP, DeepSpeed, Megatron-LM, Ray)
- Experience building or optimizing ML training pipelines for transformers or large neural-network models
π Benefits
- Collaborative, high-impact environment
- Comprehensive benefits package
- 401(k) plan with matching contributions
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!