Senior Member of Technical Staff, Web Data

Added
less than a minute ago
Type
Full time
Salary
Salary not provided

Related skills

python pandas apache spark data pipelines data quality

πŸ“‹ Description

  • Maintain large-scale pipelines for processing web corpora.
  • Implement filtering and quality-scoring systems for high-value web docs.
  • Analyze web data composition across domains, languages and time.
  • Develop and maintain highly-performant deduplication pipelines.
  • Collaborate with cross-functional teams to ensure data pipelines meet the demands of cutting-edge language models.

🎯 Requirements

  • Strong software engineering skills with Python and experience building data pipelines.
  • Familiarity with data processing frameworks such as Apache Spark, Apache Beam, Pandas, or similar tools.
  • Experience working with large-scale web datasets.
  • Knowledge of data quality assessment techniques and experimentation with data mixtures.
  • A passion for bridging research and engineering to solve complex data-related challenges in AI model training.
  • Bonus: papers at top-tier venues (NeurIPS, ICML, ICLR, AIStats, MLSys, JMLR, AAAI, Nature, COLING, ACL, EMNLP).

🎁 Benefits

  • Open and inclusive culture and work environment
  • Work with a team on AI research
  • Weekly lunch stipend, in-office lunches and snacks
  • Full health and dental benefits, mental health budget
  • 100% parental leave top-up for up to 6 months
  • Remote-flexible with offices in major cities and coworking stipend
Share job

Meet JobCopilot: Your Personal AI Job Hunter

Automatically Apply to Engineering Jobs. Just set your preferences and Job Copilot will do the rest β€” finding, filtering, and applying while you focus on what matters.

Related Engineering Jobs

See more Engineering jobs β†’