Added
17 minutes ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
python distributed systems data pipelines llms synthetic data generationπ Description
- Build and own the evaluation harness for agentic investigations
- Construct eval datasets representing real-world misuse across harm areas
- Measure agent performance end-to-end and drive improvements on hard harm areas
- Analyze coverage to identify measurement gaps and keep evals high-signal
- Productionize successful research into regression and release pipelines
- Build tooling that enables policy experts to author, run, and iterate on evaluations
π― Requirements
- Proficiency in Python and comfort across the stack
- Experience building and maintaining data pipelines
- Experience with LLMs and agentic systems with tool use and multi-step reasoning
- Strong data analysis skills and ability to derive insights from large datasets
- Ability to move between research prototyping and production-quality code
- Ability to translate ambiguous problems into concrete, testable experiments
π Benefits
- Competitive compensation and benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours
- Collaborative office space
π Visa sponsorship
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!