Design and iterate agent behaviors across real-world coding tasks.
Work with research to run evals, measure performance, edge cases.
Improve performance via prompting, tool-use strategies, context construction, and model-facing experimentation.
Analyze production failures; improve robustness and reliability.
Build data systems to improve evaluation data.
Collaborate with product to shape user-facing agent interfaces.
Define what good looks like for end-to-end agent tasks.

🎯 Requirements

Have experience building or shipping ML or LLM-powered products.
Strong in Python and modern ML tooling.
Experience with model evaluation, fine-tuning, or prompt design.
Think in terms of systems and user outcomes, not just metrics.
Enjoy debugging messy, real-world failures and turning them into improvements.
Want to turn research and model potential into usable systems that actually work for users.

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot