Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills

Tailors your resume and cover letter automatically

Works 24/7—so you don't have to

Design and fine-tune inference pipelines for LLMs to maximize throughput and minimize latency.
Apply quantization, pruning, speculative decoding, batching, and kernel fusion.
Optimize inference-serving stacks (vLLM, TensorRT-LLM, ONNX Runtime) for production.
Profile and tune GPU/accelerator utilization across the full inference stack.
Design and optimize distributed training pipelines for large-scale models.
Tune training efficiency with mixed-precision, checkpointing, and optimizer improvements.

Deep expertise in LLM inference optimization, including serving frameworks (vLLM, TensorRT-LLM, ONNX Runtime).
Strong background in distributed AI training with PyTorch FSDP, DeepSpeed, Megatron-LM, or JAX/XLA.
Experience packaging AI environments for reproducible deployment (containers, Apptainer/Singularity).
Fluency with GPU profiling tools: NVIDIA Nsight, PyTorch Profiler, CUDA analysis.
Familiarity with HPC environments: Slurm, PBS, RDMA/InfiniBand, MPI.
Experience integrating AI workloads into CI/CD pipelines with automated testing and benchmarking.
Comfort using LLM-based tools and agentic frameworks; strong analytical and communication skills.

Senior/Principal AI Performance Engineer

Meet JobCopilot: Your Personal AI Job Hunter