Added
less than a minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
cloud sql python ml benchmarkingπ Description
- Own end-to-end evaluation across accuracy, latency, and metrics.
- Build and maintain benchmarking pipelines against competitors.
- Design experiments to measure the impact of model changes.
- Onboard, curate, and maintain evaluation datasets (public and internal).
- Create evaluation subsets to stress-test capabilities and edge cases.
- Collaborate with research and engineering teams to align with customer needs.
π― Requirements
- ML fundamentals: understand model training and evaluation.
- Strong Python skills; write evaluation scripts and data pipelines.
- SQL and cloud infrastructure experience.
- Metric intuition: define metrics capturing real-world performance.
- Voice agent stack familiarity: VAD, ASR, turn detection, LLM, TTS.
- Overlap with Eastern US Time Zone: 3-4 hours required.
π Benefits
- Fully remote team.
- Shape product through research.
- Pay transparency and pay equity.
- Collaborative, diverse team.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!