Senior Engineer 2: Inference Optimizations

Added
3 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

cuda tensorrt rocm transformer moe

๐Ÿ“‹ Description

  • Lead benchmarking and perf optimizations for inference engines.
  • Engineer solutions for memory bandwidth and compute bottlenecks.
  • Implement cutting-edge optimization techniques to lead Gen AI landscape.
  • Improve batch size performance; tune AITER CK/ASK for FP8/BF16.
  • Identify kernel fusion opportunities for GLM-5 in Transformer blocks.
  • Tune gateway router kernels for MoE models like Qwen3-235B.

๐ŸŽฏ Requirements

  • 5+ years in HPC or AI infra solving compute and memory bottlenecks.
  • Gen AI literacy across LLM/VLM/LMM landscapes.
  • Optimization expert: attention layers and distributed GPU parallelism.
  • Hardware fluency with NVIDIA/AMD GPUs and CUDA/ROCm.
  • Open source mastery; contributing to OSS projects.
  • Systems design: low-level GPU programming and memory patterns.

๐ŸŽ Benefits

  • We innovate with purpose and ship impactful AI tech.
  • Career development resources including conferences and courses.
  • Well-being support: EAP, local meetups, flexible time off.
  • Equal opportunity employer; inclusive, diverse culture.
  • Global remote-friendly culture with ownership and accountability.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’