TL, Research Inference

Added
22 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

scheduling profiling gpu kernels inference_runtimes

๐Ÿ“‹ Description

  • Design and build high-performance inference runtimes for large-scale AI models.
  • Own and optimize core execution paths: model execution, memory, batching, scheduling.
  • Develop and improve distributed inference across multiple GPUs and runtime coordination.
  • Implement and optimize inference-critical operators and kernels.
  • Partner with research teams to ensure new model architectures are supported in inference.
  • Diagnose and resolve performance bottlenecks via profiling and debugging.
  • Contribute to observability, correctness, and reliability of large-scale AI systems.

๐ŸŽฏ Requirements

  • Have experience building production inference systems, not just training or running models.
  • Experience with GPU-centric performance engineering, including memory behavior and latency/throughput tradeoffs.
  • Have worked on multi-GPU or distributed systems with batching, scheduling, or runtime coordination.
  • Can reason end-to-end about inference pipelines, from request handling to execution and output streaming.
  • Able to implement research ideas within real system and performance constraints.
  • Enjoy solving hard, ambiguous systems problems that emerge at scale.
  • Prefer hands-on technical ownership and execution over abstract design work.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’