Related skills
scheduling profiling gpu kernels inference_runtimes๐ Description
- Design and build high-performance inference runtimes for large-scale AI models.
- Own and optimize core execution paths: model execution, memory, batching, scheduling.
- Develop and improve distributed inference across multiple GPUs and runtime coordination.
- Implement and optimize inference-critical operators and kernels.
- Partner with research teams to ensure new model architectures are supported in inference.
- Diagnose and resolve performance bottlenecks via profiling and debugging.
- Contribute to observability, correctness, and reliability of large-scale AI systems.
๐ฏ Requirements
- Have experience building production inference systems, not just training or running models.
- Experience with GPU-centric performance engineering, including memory behavior and latency/throughput tradeoffs.
- Have worked on multi-GPU or distributed systems with batching, scheduling, or runtime coordination.
- Can reason end-to-end about inference pipelines, from request handling to execution and output streaming.
- Able to implement research ideas within real system and performance constraints.
- Enjoy solving hard, ambiguous systems problems that emerge at scale.
- Prefer hands-on technical ownership and execution over abstract design work.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!