Related skills
sql python pandas machine learning observability๐ Description
- Run cross-layer perf investigations across throughput and latency to find root causes.
- Own and improve the correctness evaluation pipeline across hardware; lead regression investigations.
- Build observability and modeling tools for throughput, latency, reliability, and correctness.
- Partner with kernel, serving, routing, autoscaling, and capacity teams to land key optimizations.
- Rank opportunities by impact and effort; decline low-value items.
๐ฏ Requirements
- Hands-on perf engineering: profiling, roofline analysis, latency/throughput optimization, and root-cause investigation.
- Proficiency in Python; read, instrument, and contribute to large production codebases you didnโt write.
- Solid data analysis skills (e.g., SQL, pandas) to turn telemetry into clear findings.
- Ability to communicate quantitative results clearly in writing to influence priorities on teams you donโt manage.
- Genuine interest in correctness as an engineering discipline: numerics, evaluation design, regression detection.
- Experience with ML systems, especially training or inference infrastructure or large-scale inference.
๐ Benefits
- Optional equity donation matching
- Generous vacation and parental leave
- Flexible working hours
- Lovely office space
๐ Visa sponsorship
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!