Build and optimize production data pipelines powering AI personalization at scale
Own Spark-based batch pipelines (GCP Dataproc) and tuning
Build and maintain ML Data Lake for data quality and access
Support ML engineers and scientists with data for modeling
Identify bottlenecks and scale data pipelines and infra
Collaborate with distributed systems engineers on platform evolution

🎯 Requirements

5+ years data engineering
Spark (PySpark) dataframes and scaling
GCP Dataproc large-scale processing
Python with testing, git, CI/CD
Parquet/Delta Lake storage formats
Docker, Kubernetes, GitHub Actions

🎁 Benefits

Medical, financial, and other benefits
Opportunity to work with leading brands
Collaborative, fast-paced environment
Exposure to data/ML at scale

Apply on employer's website

This employer gathers applications via their own applicant tracking system.

You will be redirected to an external application form.

Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Data Jobs. Just set your preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.

Activate JobCopilot