BS in CS/EE or equivalent practical experience
5+ years in ML systems, performance engineering, distributed systems, or HPC
PyTorch and modern LLM training/inference stacks
Large-scale distributed training concepts (data/model/pipeline parallel, collective comms)
RDMA and debugging/optimizing comms libraries (NCCL or RCCL)
Python; C++/CUDA/HIP; profiling with Nsight/rocprof/perf

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot