Baseten

11-50
10 jobs posted

View company profile →

Please mention that you found this job on empllo.com. Thanks & good luck!

Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills
Tailors your resume and cover letter automatically
Works 24/7—so you don't have to

Activate JobCopilot

Follow us on LinkedIn!

Post-Training Applied Researcher

Added

22 days ago

Location

🌍 North America

Type

Full time

Salary

Upgrade to Premium to se...

Related skills

machine learning llm ppo dpo sft

📋 Description

Design and run post-training pipelines (SFT, GRPO, DPO, RLVR)
Build task-specific training environments and evals for healthcare, code, and legal
Translate production data into training signals; design reward loops
Run end-to-end training experiments; diagnose reward hacking and drift
Publish findings and contribute to Baseten's open-source training libraries

🎯 Requirements

Hands-on LLM training with reinforcement learning (GRPO/PPO)
Strong reward engineering intuition; distinguish effective vs exploitable rewards
Experience building multi-turn agent environments with tool use
Comfort with end-to-end ML pipeline from data to deployment
Experience with production ML systems; prefer closed-loop production data
Experience with RL training frameworks
Publications at NeurIPS/ICML/ICLR on RL for LLMs, reward modeling, or alignment

🎁 Benefits

Competitive pay with meaningful equity
100% medical, dental, and vision for you and dependents
Generous PTO including Winter Break
Paid parental leave
Company-facilitated 401(k)
Exposure to ML startups for learning/networking

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot