Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills

Tailors your resume and cover letter automatically

Works 24/7—so you don't have to

Design and build high-performance inference runtimes for large-scale AI models.
Own and optimize core execution paths: model execution, memory, batching, scheduling.
Develop and improve distributed inference across multiple GPUs and runtime coordination.
Implement and optimize inference-critical operators and kernels.
Partner with research teams to ensure new model architectures are supported in inference.
Diagnose and resolve performance bottlenecks via profiling and debugging.
Contribute to observability, correctness, and reliability of large-scale AI systems.

Have experience building production inference systems, not just training or running models.
Experience with GPU-centric performance engineering, including memory behavior and latency/throughput tradeoffs.
Have worked on multi-GPU or distributed systems with batching, scheduling, or runtime coordination.
Can reason end-to-end about inference pipelines, from request handling to execution and output streaming.
Able to implement research ideas within real system and performance constraints.
Enjoy solving hard, ambiguous systems problems that emerge at scale.
Prefer hands-on technical ownership and execution over abstract design work.

TL, Research Inference

Meet JobCopilot: Your Personal AI Job Hunter