Related skills
cuda tensorrt rocm moe openai triton📋 Description
- Lead benchmarking and performance optimizations for inference engine and GPU kernels.
- Engineer mem bandwidth and compute utilization across multi-node GPU clusters.
- Implement cutting-edge optimization techniques (AITER tuning, kernel fusion, MoE routing).
- Serve as SME on GPUs and software stacks (CUDA, ROCm, TensorRT, OpenAI Triton).
- Mentor through high-quality code and design reviews to raise the technical bar.
- Partner with Product and TPMs to translate hardware limits into shipped features.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!