Senior Engineer 2: Inference Optimizations

Added
3 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

cuda tensorrt gpu rocm transformer

๐Ÿ“‹ Description

  • Lead benchmarking and performance optimizations for inference engines and GPU kernels.
  • Deep-dive into attention/memory/precision optimization and multi-node GPUs.
  • Proactively implement cutting-edge optimization techniques for Gen AI workloads.
  • Master GPU hardware and software stacks (CUDA, ROCm, TensorRT, Triton).
  • Mentor through code and design reviews to raise the team's bar.
  • Collaborate with Product/TPMs to translate hardware limits into features.

๐ŸŽฏ Requirements

  • 5+ years in high-performance computing or AI infrastructure.
  • Gen AI literacy across LLM, VLM, and LMM architectures.
  • Optimization expert in attention layers and distributed GPU parallelism.
  • Hardware fluency with NVIDIA/AMD GPUs; CUDA/ROCm ecosystems.
  • Open source experience: build, integrate, contribute.
  • Strong systems design for low-level GPU programming and memory access.

๐ŸŽ Benefits

  • Reimbursement for conferences, training, and education.
  • LinkedIn Learning access to 10,000+ courses.
  • EAP and local employee meetups; flexible time off.
  • Salary range based on market data; potential bonus.
  • Equity grants and Employee Stock Purchase Program.
  • DigitalOcean is an equal-opportunity employer.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’