Related skills
python kafka spark iceberg flink📋 Description
- Build and evolve the data systems powering Cursor’s product.
- Design and operate large-scale batch data systems with Spark and Ray Data.
- Scale data ingestion pipelines to billions of rows per day.
- Re-architect storage for prompts/models focusing on cost, performance, and usability on S3.
- Build and maintain streaming data infrastructure (Kafka, Flink, or similar).
- Work across warehouses and lakehouse formats like Iceberg/Delta Lake.
- Improve data developer experience for Python-heavy workflows.
- Support replication and change data capture pipelines (DMS, Debezium).
🎯 Requirements
- Deep experience with Spark (Databricks or open-source)
- Production experience with Ray Data
- Ownership of large data pipelines and storage systems
- Comfort debugging performance across compute, storage, networking
- Clear thinking about data modeling and maintainability
- Experience running or scaling ClickHouse
- Familiarity with dbt, Dagster, or similar tooling
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Data Jobs. Just set your
preferences and Job Copilot will do the rest — finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!