AI Engineer, Quality (Evals)

Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

react postgresql python typescript langgraph

๐Ÿ“‹ Description

  • Own enterprise-scale evaluation infrastructure for AI agents.
  • Build unified evaluation platform serving as single source of truth for workflows.
  • Develop observability to surface agent behavior and failures.
  • Integrate LLMs, tools, retrieval, and logic into reliable agent experiences.
  • Create automated pipelines to evaluate models within hours of release.

๐ŸŽฏ Requirements

  • Multiple years shipping production software in complex systems.
  • TypeScript, React, Python, and PostgreSQL.
  • Built and deployed LLM-powered features in production.
  • Designed evaluation frameworks for model outputs and agent behaviors.
  • Worked with vector databases, embeddings, and RAG architectures.
  • Experience with evaluation platforms (LangSmith, Langfuse, or similar).

๐ŸŽ Benefits

  • Competitive compensation with meaningful ownership.
  • Flexible PTO.
  • 401k.
  • Wellness benefits, including therapy sessions.
  • Technology & Work from Home reimbursement.
  • Flexible work schedules.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’