Role overview
CoreWeave is seeking a Director of Engineering for the Training Platform to lead the development and operations of a scalable AI training infrastructure. You will supervise multiple engineering teams responsible for data pipelines, model training workflows, and tooling to enable efficient, high-performance AI model training at scale.
Responsibilities
- Lead and grow the Engineering organization focused on the Training Platform.
- Define and drive the technical strategy, roadmap, and architecture.
- Design and maintain scalable data pipelines, training workflows, and tooling.
- Oversee cloud infrastructure (Kubernetes, AWS/GCP), CI/CD, monitoring, and security.
- Promote reliability, observability, and performance best practices.
- Collaborate with ML researchers, product, and operations teams.
- Hire, mentor, and develop engineering talent; manage performance.
- Plan budgets and schedules; ensure timely delivery.
Qualifications
- 10+ years of software engineering experience with leadership; prior experience building ML/training platforms or large-scale infrastructure.
- Strong background in distributed systems, microservices, and APIs.
- Proficiency in Python, Go, and Rust; knowledge of C/C++ is a plus.
- Experience with Kubernetes, Docker, Terraform, and cloud services (AWS/GCP).
- Familiarity with GPUs, NVIDIA CUDA, and HPC environments.
- Bachelor's degree in CS or a related field; advanced degree preferred.
- Excellent communication, leadership, and cross-functional collaboration.
Benefits
- Competitive salary and comprehensive benefits package.
- Generous equity and opportunities for growth.
- Health, dental, and vision insurance.
- Retirement plan options.
- Generous paid time off.