Related skills
data pipelines large language models reinforcement learning fine-tuning claudeπ Description
- Own end-to-end RL environment creation for new capabilities.
- Improve and execute fine-tuning for Claude in new domains.
- Manage external data vendors; evaluate data quality and rewards.
- Collaborate with domain experts on data pipelines and evals.
- Explore RL environment designs for high-value tasks.
- Develop QA frameworks to catch reward hacking and env quality.
π― Requirements
- Experience fine-tuning LLMs for specific domains or real-world use cases.
- Experience with RL, reward design, or data curation for LLMs.
- Comfortable managing vendor relationships and rapid iteration loops.
- Strong project management and interpersonal skills.
- Bachelor's degree or equivalent experience.
- Excited about a role combining ML research, data ops, and PM.
π Benefits
- Competitive compensation and benefits.
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- Office space for collaboration.
π Visa sponsorship
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!