Senior Engineer 2: Inference Optimizations

Added
11 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

cuda tensorrt rocm transformer openai triton

๐Ÿ“‹ Description

  • Lead benchmarking and performance optimizations at inference engine and GPU kernels.
  • Engineer solutions for memory bandwidth and compute bottlenecks across multi-node GPUs.
  • Implement cutting-edge Gen AI optimization techniques to stay ahead.
  • Partner with Product Management and TPMs to translate hardware limits into features.
  • Maintain a strong presence in GPU performance and open-source communities.

๐ŸŽฏ Requirements

  • 5+ years in HPC/AI infra solving compute and memory bottlenecks.
  • Gen AI literacy across LLM/VLM/LMM and major model families.
  • Hands-on attention-layer optimization and distributed GPU parallelization.
  • Hardware fluency: NVIDIA/AMD GPUs and CUDA/ROCm ecosystems.
  • Open source mastery: build/contribute to OSS projects.
  • Systems design: low-level GPU programming, memory patterns, parallel execution.

๐ŸŽ Benefits

  • Career development with conferences, training, and education support.
  • Global benefits: EAP, local meetups, flexible time off.
  • LinkedIn Learning access for growth and skills.
  • Equity and Employee Stock Purchase Program.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’