Related skills
python pytorch machine learning training benchmarking๐ Description
- Own end-to-end training pipeline: ingestion, orchestration, checkpointing, logging.
- Run large-scale experiments with reproducibility and traceability.
- Build automated evaluation pipelines to detect regressions across model checkpoints.
- Define metrics for accuracy, latency, memory usage, domain coverage.
- Manage compute resources across parallel experiments for throughput and cost efficiency.
- Partner with Data teams to ensure training data improvements translate to gains.
๐ฏ Requirements
- MS/PhD in CS/Engineering/Math or related field.
- 5+ years in ML, Applied AI, or related areas.
- Strong Python and PyTorch; distributed training with DeepSpeed and FSDP.
- Experience training and evaluating large-scale language and/or vision-language models.
- Ability to design end-to-end training pipelines, experiment tracking, benchmarking.
- Strong collaboration and communication across cross-functional teams.
๐ Benefits
- Comprehensive medical, accidental, and life insurance.
- Remote and hybrid working options to fit lifestyles.
- Flexible hours across most teams.
- Two paid volunteering days off per year.
- Paid parental leave in all locations.
- Generous paid time off policy.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!