Related skills
aws sql python kubernetes airflowπ Description
- Architect, build, and scale distributed ETL/ELT pipelines across healthcare datasets.
- Lead Data Lake architecture for scalability, observability, reliability, and cost control.
- Own data quality, validation, normalization, and standardization workflows across thousands of sources.
- Design and optimize batch and near real-time data processing with cloud-native systems.
- Mentor engineers and drive production reliability across the platform.
- Collaborate with Product, Infrastructure, Security, and downstream teams to deliver secure data delivery.
π― Requirements
- 8+ years in data engineering or related fields building scalable data platforms.
- Strong Python (PySpark), Java, Scala, or similar; advanced SQL expertise.
- Deep experience with Apache Spark and AWS big data services (EMR/Glue/S3/Athena/Redshift).
- Experience designing and scaling cloud-native data lake architectures with large-scale ingestion.
- Experience with Airflow or Argo; distributed storage and Parquet/Avro/ORC.
- Docker, Kubernetes, and modern containerization; monitoring and data quality in production.
π Benefits
- Health insurance options and generous PTO.
- Pre-planned company wellness holidays.
- Retirement options.
- Health and charitable donation stipends.
- Business Resource Groups and flexible work hours.
- The opportunity to work with leading biotech and life sciences companies.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Data Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!