Related skills
docker aws kubernetes pytorch airflowπ Description
- Build scalable systems for ingesting and preprocessing large-scale video data for model training
- Design and scale distributed data pipelines for preprocessing and dataset refreshes
- Own workflow orchestration, job scheduling, monitoring, and failure recovery
- Implement containerized pipeline infrastructure using Kubernetes or similar
- Optimize cloud data storage and movement across AWS, GCS, or Azure for cost and throughput
- Define best practices for dataset storage, versioning, caching, retention, and access
π― Requirements
- Strong hands-on experience building or scaling large-scale data systems for ML
- Experience with distributed data processing frameworks such as PySpark or Ray, and orchestration tools such as Airflow
- Familiarity with containerization and container orchestration, including Docker and Kubernetes
- Experience with cloud-based data storage and compute (AWS, GCS, and/or Azure) including tradeoffs around cost, throughput, storage layout, and access patterns
- Familiarity with video and media processing tools such as FFmpeg, PyAV, OpenCV
- Proficiency in Python and modern ML frameworks, with a strong preference for PyTorch or JAX
π Benefits
- Competitive salary and generous equity
- Personal time off and paid holidays
- Health insurance
- Global travel insurance: Covers you when traveling internationally
- Monthly spending stipend: $500 (~S$635)
- Equipment: All equipment needed for your home office
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!