Related skills
networking python pytorch distributed systems apisπ Description
- Build and maintain infrastructure for large-scale model training and experimentation.
- Design APIs and interfaces that make complex training workflows easier to express and harder to misuse.
- Improve reliability, debuggability, and performance across training and data pipelines.
- Debug issues spanning Python, PyTorch, distributed systems, GPUs, networking, and storage.
- Write tests, benchmarks, and diagnostics that catch meaningful regressions.
π― Requirements
- Strong systems instincts with focus on performance, reliability, and clean abstractions.
- Comfortable working across ML research code and production infrastructure.
- Good taste in API and interface design with empathy for researchers.
- Debug across Python, PyTorch, distributed systems, GPUs, networking.
- Write tests, benchmarks, and diagnostics to catch regressions.
- Proficient in Python and PyTorch for ML workflows.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!