Added
5 hours ago
Location
Type
Full time
Salary
Salary not provided
Related skills
java terraform redis python pytorch๐ Description
- Design, build, and operate standardized training-to-serving pipelines with Airflow for SageMaker endpoints.
- Real-time and batch inference on SageMaker: multi-model endpoints, autoscaling.
- Ultra-low-latency serving with Redis/Valkey: feature caching and online retrieval.
- Provision ML infrastructure with Terraform: SageMaker endpoints, ECR/ECS/EKS, VPC, IAM.
- Build platform abstractions and golden paths: Airflow DAGs, CLI/SDKs, CI/CD pipelines.
- Govern model lifecycle: registries, approvals, lineage, and audits.
๐ฏ Requirements
- 5+ years building production-grade ML/data platforms.
- Strong software engineering in Python, Go, or Java with APIs and tooling.
- Deep experience with AWS SageMaker inference: endpoint config, containerization, autoscaling.
- Expertise with online feature stores like Redis/Valkey for ML serving.
- Terraform experience managing ML and data infra end-to-end (GitOps preferable).
- Airflow orchestration at scale: DAGs, sensors, retries, SLAs, backfills.
๐ Benefits
- Team lunches and game nights
- Company-wide events and socials
- Hybrid in-office model in Canada (Toronto & Montreal)
- Growth-focused culture with data-driven learning
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Engineering Jobs. Just set your
preferences and Job Copilot will do the rest โ finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!