Senior Engineer 2: Inference Optimizations

Added
11 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

cuda tensorrt rocm openai triton bf16

📋 Description

  • Lead benchmarking and performance optimizations for inference engines.
  • Attention, memory, and precision optimizations; multi-node GPU parallelism.
  • Apply cutting-edge Gen AI optimizations (FP8, BF16) and techniques.
  • Collaborate with Product/TPMs to translate hardware limits into features.
  • Mentor engineers via code/design reviews to raise the bar.
  • Contribute to open-source AI and GPU performance communities.

🎯 Requirements

  • Technical Depth: 5+ years in HPC or AI infrastructure.
  • Gen AI literacy across LLM/VLM/LMM architectures.
  • Attention-layer optimization and distributed GPU parallelism.
  • Hardware fluency with NVIDIA/AMD GPUs and CUDA/ROCm ecosystems.
  • Open source mastery: building with and contributing to OSS.
  • Systems design for low-level GPU programming, memory access patterns.

🎁 Benefits

  • We innovate with purpose—building for builders.
  • Career development resources and LinkedIn Learning access.
  • Well-being benefits and flexible time off for work-life balance.
  • Competitive compensation and equity opportunities.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs →