Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
react postgres python typescript vector databasesπ Description
- Design and build a unified evaluation platform serving as the single source of truth for agentic systems and audit workflows
- Build observability to surface agent behavior, trace execution, and capture production failures
- Own the evaluation infrastructure stack, including integration with LangSmith and LangGraph
- Develop automated pipelines to evaluate new models within hours of release
- Design evaluation harnesses and frameworks to measure effectiveness, latency, and cost
- Design prompts, retrieval pipelines, and agent orchestration to scale reliably
π― Requirements
- Multiple years of experience shipping production software in complex, real-world systems
- Experience with TypeScript, React, Python, and Postgres
- Built and deployed LLM-powered features serving production traffic
- Implemented evaluation frameworks for model outputs and agent behaviors
- Designed observability/tracing infrastructure for AI/ML systems
- Worked with vector databases, embedding models, and RAG architectures
- Experience with evaluation platforms (LangSmith, Langfuse, or similar)
π Benefits
- Remote-first company; work from anywhere
- Opportunity to shape enterprise AI for audit and advisory workflows
- Collaborative, inclusive culture focused on growth
- Chance to influence evaluation infrastructure at scale
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!