Overview
CoreWeave is seeking a Bare Metal Support Engineer to join our hardware operations team. This full-time role focuses on supporting our fleet of bare metal servers across multiple data centers. You will triage hardware issues, perform firmware updates, manage remote hardware consoles (IPMI/BMC), and collaborate with SRE and platform engineering to ensure high availability and performance of CoreWeave's cloud infrastructure.
Responsibilities
- Monitor and triage hardware incidents across data centers
- Diagnose and repair hardware failures (RAM, CPUs, storage), perform firmware updates and BIOS configurations
- Manage IPMI/BMC consoles, remote management, and out-of-band access
- Collaborate with SRE, networking, and hardware vendors to resolve issues and plan hardware lifecycle
- Build and maintain runbooks, automation scripts (Python, Bash) to streamline tasks
- Maintain asset inventory and contribute to hardware provisioning processes
- Participate in on-call rotation for incident response
Requirements
- 3+ years in server hardware support, data center operations, or systems administration
- Strong Linux experience (bash scripting, system administration) and comfortable with Python
- Hands-on experience with IPMI/BMC and remote management tools
- Familiarity with networking basics and IT infrastructure
- Experience with automation tools (Ansible) and configuration management
- Excellent problem-solving, communication, and teamwork skills
- Ability to work in a fast-paced environment and participate in on-call rotations
- Bachelor's degree in Computer Science, Electrical Engineering, or equivalent experience
Nice to have
- Experience with cloud platforms or bare metal provisioning tools
- Familiarity with monitoring systems (Prometheus, Nagios)
Benefits
- Competitive salary and comprehensive benefits package
- Health, dental, and vision coverage
- 401(k) or equivalent retirement plan
- Generous paid time off and holidays
- Opportunities for growth and professional development