OKX

108 jobs posted

View company profile →

Please mention that you found this job on empllo.com. Thanks & good luck!

Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills
Tailors your resume and cover letter automatically
Works 24/7—so you don't have to

Activate JobCopilot

Follow us on LinkedIn!

Staff AI Engineer, Model Post-Training and Alignment

Added

2 days ago

Location

🇸🇬 Singapore

Type

Full time

Salary

Salary not provided

Related skills

python pytorch reinforcement learning dpo vllm

📋 Description

Lead and execute post-training pipelines for LLMs (supervised fine-tuning, RL).
Design and implement DPO and GRPO training paradigms.
Develop domain-specific data recipes, curation, and augmentation pipelines.
Post-train specialized small models from scratch (architecture, data).
Build and refine Reward Models to support alignment.
Design RLAIF closed-loop alignment systems.

🎯 Requirements

Bachelor's in CS/AI/ML or related with 8+ years of industry experience.
Hands-on experience across the full post-training pipeline for LLMs.
Deep familiarity with preference learning and alignment: DPO, GRPO, RL-based.
Experience designing domain-specific data strategies and training methodologies.
Experience training and post-training specialized small models from scratch.
Experience deploying models in low-latency production with vLLM and SGLang.

🎁 Benefits

Competitive total compensation package
L&D and education subsidy for growth
Team building programs and company events
Wellness and meal allowances
Comprehensive healthcare for employees and dependants
More that we love to tell you along the process

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot