Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills

Tailors your resume and cover letter automatically

Works 24/7—so you don't have to

Optimize large-scale models (LLMs/VLMs) with quantization (PTQ/QAT) and LoRA/QLoRA.
Architect model conversion/compilation pipelines using TensorRT for edge deployment.
Parity, accuracy recovery, and latency benchmarking vs. PyTorch and edge binaries.
Write and optimize CUDA kernels and TensorRT Plugins for high memory bandwidth and low latency.
Produce production-grade, concurrent C++ and Python code for real-time edge inference.

Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference (INT8, FP8, INT4, BF16/FP16).
Proven experience optimizing large-scale models (LLMs, VLMs, or VLAs) using KV-cache, Speculative Decoding, and Efficient Attention (FlashAttention, Linear Attention).
Extensive experience with model conversion/compilation pipelines (TensorRT, TensorRT-LLM) and benchmarking.
Proficiency in low-level programming for AI accelerators: writing and optimizing custom CUDA kernels and TensorRT Plugins.
Production-level C++ (14/17/20) and Python programming for concurrent, memory-safe, real-time edge inference.

Zoox Stock Appreciation Rights (SARs) and Amazon RSUs
Health, long-term care, disability, and life insurance
Paid time off, vacation, and sick leave
Competitive compensation and benefits package including health insurance and stock

AI Inference Engineer - Model Optimization & Deployment

Meet JobCopilot: Your Personal AI Job Hunter