Related skills
pytorch tensorrt ray gke nvidia triton📋 Description
- Own Ray ecosystem end-to-end on GKE
- Operate Ray Train on multi-node H100 clusters
- Build LLM inference mesh with Ray Serve
- Optimize inference: fractional GPUs, batching, autoscaling
- Design model routing layer for multi-tenant LLMs
- Build RL training infra with Flyte and RLlib
🎯 Requirements
- Experience in ML engineering with ML platform or MLOps
- Production Ray depth: Train, Serve, Core, Data
- LLM serving engines: vLLM, SGLang, NVIDIA Triton
- Distributed training: DDP, FSDP, NCCL, mixed precision BF16/FP8
- RL knowledge: PPO, policy gradient, RLHF
- Model lifecycle ops: MLflow registry, shadow/A/B/canary, auto rollback
- Vector databases: Pgvector or Qdrant
- Python and PyTorch; Flyte or equivalent ML orchestrator
🎁 Benefits
- Competitive total rewards package
- Opportunities for growth and advancement
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!