Lead benchmarking and performance optimizations for inference engine and GPU kernels.
Engineer mem bandwidth and compute utilization across multi-node GPU clusters.
Implement cutting-edge optimization techniques (AITER tuning, kernel fusion, MoE routing).
Serve as SME on GPUs and software stacks (CUDA, ROCm, TensorRT, OpenAI Triton).
Mentor through high-quality code and design reviews to raise the technical bar.
Partner with Product and TPMs to translate hardware limits into shipped features.

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot