Related skills
docker python kubernetes tensorflow pytorchπ Description
- Designing, building, and maintaining scalable ML platforms, tools, and infrastructure for training, experimentation, and evaluation of models
- Migrating workloads from SGE Grid Engine to Kubernetes based systems such as Slurm or Ray to improve scalability and reliability
- Modernising ML workflows with tools for pipeline orchestration, experiment tracking, observability, and model versioning
- Optimising ML infrastructure to maximise GPU utilisation, scheduling, and distributed training performance
- Building and maintaining CI/CD workflows for ML systems, including model versioning, testing, and release automation
- Collaborating with ML engineers to ensure models are deployable, reproducible, and production ready
π― Requirements
- Python skills for production systems
- Proven ability to optimise model training and inference, including GPU based and distributed workloads
- Experience with containerisation and orchestration tools such as Docker and Kubernetes
- Strong background in monitoring, observability, and reliability for ML platforms
- Hands-on experience across cloud, on-premises, and hybrid infrastructure environments
- Solid understanding of ML fundamentals including training, evaluation, and lifecycle management
π Benefits
- Hybrid work with 2-3 designated office days per week
- Flexible working, team lunches, and birthday celebrations
- Private medical and dental cover for you and family; pension/401K matching
- Working from home allowance for tech or home office equipment
- Global working opportunities and generous holiday allowance
- Parental leave support including adoption and reproductive health services
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!