Added
less than a minute ago
Location
Type
Contract
Salary
Upgrade to Premium to se...
Related skills
docker aws postgresql python llm๐ Description
- Design a structured, configurable evaluation engine
- Combine deterministic checks with LLM-as-judge verdicts
- Build calibration workflows using expert-labeled examples
- Measure precision and recall properly
- Handle delayed outcomes and low-confidence review flows
- Store structured verdicts powering dashboards and analytics
๐ฏ Requirements
- 4+ years backend / ML engineering experience
- 2+ years building production AI/LLM systems
- Python, Docker and PostgreSQL experience
- AWS, OpenAI, Anthropic, and other LLM APIs
- Proven experience building LLM-based production systems
- Experience developing evaluation/QA/score pipelines
๐ Benefits
- Remote work with LATAM focus
- Independent contractor via payroll platform
- Remote work allocated at client
- Human-in-the-loop workflow design (Plus)
- OpenTelemetry familiarity (Plus)
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!