Related skills
python pytorch reinforcement learning dpo vllmπ Description
- Lead and execute post-training pipelines for LLMs (supervised fine-tuning, RL).
- Design and implement DPO and GRPO training paradigms.
- Develop domain-specific data recipes, curation, and augmentation pipelines.
- Post-train specialized small models from scratch (architecture, data).
- Build and refine Reward Models to support alignment.
- Design RLAIF closed-loop alignment systems.
π― Requirements
- Bachelor's in CS/AI/ML or related with 8+ years of industry experience.
- Hands-on experience across the full post-training pipeline for LLMs.
- Deep familiarity with preference learning and alignment: DPO, GRPO, RL-based.
- Experience designing domain-specific data strategies and training methodologies.
- Experience training and post-training specialized small models from scratch.
- Experience deploying models in low-latency production with vLLM and SGLang.
π Benefits
- Competitive total compensation package
- L&D and education subsidy for growth
- Team building programs and company events
- Wellness and meal allowances
- Comprehensive healthcare for employees and dependants
- More that we love to tell you along the process
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!