Related skills
python kafka spark pyspark google cloud platformπ Description
- Build and optimize production data pipelines powering AI personalization at scale
- Own Spark-based batch pipelines (GCP Dataproc) and tuning
- Build and maintain ML Data Lake for data quality and access
- Support ML engineers and scientists with data for modeling
- Identify bottlenecks and scale data pipelines and infra
- Collaborate with distributed systems engineers on platform evolution
π― Requirements
- 5+ years data engineering
- Spark (PySpark) dataframes and scaling
- GCP Dataproc large-scale processing
- Python with testing, git, CI/CD
- Parquet/Delta Lake storage formats
- Docker, Kubernetes, GitHub Actions
π Benefits
- Medical, financial, and other benefits
- Opportunity to work with leading brands
- Collaborative, fast-paced environment
- Exposure to data/ML at scale
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Data Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!