Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
llms rlaif dpo vllm post-trainingπ Description
- Lead and execute post-training pipelines for LLMs (supervised fine-tuning, RL).
- Design advanced training paradigms such as DPO and GRPO.
- Develop domain-specific data recipes, curation, and augmentation.
- Post-train specialized small models from scratch: architecture, data, optimization.
- Build and refine Reward Models to support alignment and downstream optimization.
- Improve inference efficiency with low-latency serving (vLLM, SGLang).
π― Requirements
- Bachelor's in CS/AI/ML or related; 8+ years industry experience.
- Strong hands-on experience with post-training pipelines for large models.
- Deep familiarity with DPO, GRPO, and RL-based post-training methods.
- Experience training specialized small models from scratch.
- Solid understanding of RL fundamentals and alignment applications.
- Experience deploying models in low-latency production (vLLM, SGLang).
π Benefits
- Competitive total compensation package
- L&D programs and education subsidy
- Team building programs and company events
- Wellness and meal allowances
- Healthcare schemes for employees and dependants
- Additional benefits disclosed during the process
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!