Training: Process Management Engineer

Added
1 day ago
Type
Full time
Salary
Salary not provided

Related skills

rust linux python distributed systems asynchronous

πŸ“‹ Description

  • Work across our Python and Rust stack
  • Design, build, and maintain software to orchestrate ML workloads
  • Profile and optimize software for frontier-scale orchestration
  • Improve reliability and fault tolerance for long-running jobs
  • Debug distributed systems across large clusters
  • Respond to evolving ML needs to enable researchers

🎯 Requirements

  • Experience developing distributed systems (not just operating)
  • Understand large systems' behavior and failure at scale
  • Care deeply about performance, correctness, and reliability
  • Proficient in Python and Rust (or C++)
  • Strong Linux knowledge; debugging, perf analysis, memory profiling
  • Comfortable with asynchronous and concurrent systems

🎁 Benefits

  • Hybrid work model: 3 days in the office per week
  • Relocation assistance for new employees

🚚 Relocation support

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’