Hardware Engineer (GPU TS)

Added
3 days ago
Type
Full time
Salary
Upgrade to Premium to se...

Related skills

ansible grafana prometheus python infiniband

๐Ÿ“‹ Description

  • Troubleshoot GPU and PCIe failures.
  • Partner with vendors on failure analysis.
  • Track component RMAs.
  • Automate server hardware lifecycle.
  • Lead hardware escalation and troubleshooting.
  • Define hardware requirements, specs and playbooks.

๐ŸŽฏ Requirements

  • 2+ years of data center GPU troubleshooting (H100+; Infiniband; NVLink).
  • Proficiency in Ansible and Python; IPMI or Redfish (Redfish preferred).
  • Experience automating GPU diagnostics with Prometheus & Grafana.
  • Deep knowledge of GPUs, PCIe and server hardware.
  • Collaborated with hardware vendors to create alerts and playbooks.
  • Strong automation mindset with thorough documentation.

๐ŸŽ Benefits

  • Medical, dental, and vision insurance paid.
  • Company-paid life and disability insurance.
  • 401(k) with employer match; ESPP options.
  • Tuition reimbursement; flexible PTO.
  • Wellness programs and HSA/FSA.
  • Hybrid work environment with remote options.
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest โ€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs โ†’