Added
less than a minute ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

react postgres python typescript vector databases

πŸ“‹ Description

  • Design and build a unified evaluation platform serving as the single source of truth for agentic systems and audit workflows
  • Build observability to surface agent behavior, trace execution, and capture production failures
  • Own the evaluation infrastructure stack, including integration with LangSmith and LangGraph
  • Develop automated pipelines to evaluate new models within hours of release
  • Design evaluation harnesses and frameworks to measure effectiveness, latency, and cost
  • Design prompts, retrieval pipelines, and agent orchestration to scale reliably

🎯 Requirements

  • Multiple years of experience shipping production software in complex, real-world systems
  • Experience with TypeScript, React, Python, and Postgres
  • Built and deployed LLM-powered features serving production traffic
  • Implemented evaluation frameworks for model outputs and agent behaviors
  • Designed observability/tracing infrastructure for AI/ML systems
  • Worked with vector databases, embedding models, and RAG architectures
  • Experience with evaluation platforms (LangSmith, Langfuse, or similar)

🎁 Benefits

  • Remote-first company; work from anywhere
  • Opportunity to shape enterprise AI for audit and advisory workflows
  • Collaborative, inclusive culture focused on growth
  • Chance to influence evaluation infrastructure at scale
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’