Senior/Principal AI Performance Engineer

Added
13 hours ago
Type
Full time
Salary
Salary not provided

Related skills

deepspeed onnx runtime vllm tensorrt-llm llm inference optimization

πŸ“‹ Description

  • Design and fine-tune inference pipelines for LLMs to maximize throughput and minimize latency.
  • Apply quantization, pruning, speculative decoding, batching, and kernel fusion.
  • Optimize inference-serving stacks (vLLM, TensorRT-LLM, ONNX Runtime) for production.
  • Profile and tune GPU/accelerator utilization across the full inference stack.
  • Design and optimize distributed training pipelines for large-scale models.
  • Tune training efficiency with mixed-precision, checkpointing, and optimizer improvements.

🎯 Requirements

  • Deep expertise in LLM inference optimization, including serving frameworks (vLLM, TensorRT-LLM, ONNX Runtime).
  • Strong background in distributed AI training with PyTorch FSDP, DeepSpeed, Megatron-LM, or JAX/XLA.
  • Experience packaging AI environments for reproducible deployment (containers, Apptainer/Singularity).
  • Fluency with GPU profiling tools: NVIDIA Nsight, PyTorch Profiler, CUDA analysis.
  • Familiarity with HPC environments: Slurm, PBS, RDMA/InfiniBand, MPI.
  • Experience integrating AI workloads into CI/CD pipelines with automated testing and benchmarking.
  • Comfort using LLM-based tools and agentic frameworks; strong analytical and communication skills.

🎁 Benefits

  • Medical, dental, and vision insurance.
  • Flexible paid time off.
  • Employee stock options.
  • Remote work; no travel required for most positions.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’