Added
1 minute ago
Location
Type
Full time
Salary
Upgrade to Premium to se...
Related skills
rust python pytorch jax cudaπ Description
- Build roofline simulator to track workloads and analyze architecture impact.
- Debug gaps between performance simulation and real measurements; identify root causes.
- Write emulation kernels for low-precision numerics and lossy compression.
- Prototype numerics modules by pushing RTL through synthesis; own RTL end-to-end.
- Proactively pull in new ML workloads; evaluate opportunities or risks.
- Understand ML science to hardware optimization; deliver near-term deliverables.
π― Requirements
- Strong Python, and C++ or Rust; clean extensibility.
- Experience writing Triton, CUDA, or similar; mapping tensor ops to functional units.
- Working knowledge of PyTorch or JAX; large ML codebases a plus.
- Practical understanding of floating point numerics; ML quantization tradeoffs.
- Deep understanding transformer models; rooflines and sharded training/inference.
- RTL writing for floating point logic; PPA tradeoffs a plus.
π Benefits
- $230Kβ$460K USD plus equity.
- Relocation assistance available.
- Hybrid work: 3 days per week onsite in San Francisco.
π Relocation support
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!