Software Engineer, Safeguards Evals

Added
17 minutes ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

python distributed systems data pipelines llms synthetic data generation

πŸ“‹ Description

  • Build and own the evaluation harness for agentic investigations
  • Construct eval datasets representing real-world misuse across harm areas
  • Measure agent performance end-to-end and drive improvements on hard harm areas
  • Analyze coverage to identify measurement gaps and keep evals high-signal
  • Productionize successful research into regression and release pipelines
  • Build tooling that enables policy experts to author, run, and iterate on evaluations

🎯 Requirements

  • Proficiency in Python and comfort across the stack
  • Experience building and maintaining data pipelines
  • Experience with LLMs and agentic systems with tool use and multi-step reasoning
  • Strong data analysis skills and ability to derive insights from large datasets
  • Ability to move between research prototyping and production-quality code
  • Ability to translate ambiguous problems into concrete, testable experiments

🎁 Benefits

  • Competitive compensation and benefits
  • Optional equity donation matching
  • Generous vacation and parental leave
  • Flexible working hours
  • Collaborative office space

πŸ›ƒ Visa sponsorship

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’