Related skills
llms rlaif dpo vllm grpoπ Description
- Lead and execute post-training pipelines for LLMs (supervised finetuning, RL).
- Design training paradigms such as DPO and GRPO for alignment.
- Develop domain-specific data recipes, curation, and augmentation.
- Post-train specialized small models from scratch (architecture, data).
- Build and refine Reward Models to support alignment.
π― Requirements
- Bachelor's in CS/AI or related fields.
- 8+ years of industry ML/AI experience.
- Hands-on post-training pipelines for large models.
- Experience with reinforcement learning for alignment.
- Familiarity with domain-specific data strategies.
- Experience deploying models in production.
π Benefits
- Competitive total compensation.
- L&D programs and education subsidy.
- Team-building programs and company events.
- Wellness and meal allowances.
- Comprehensive healthcare for employees and dependents.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!