Related skills
python machine learning data pipelines bioinformatics large-scale dataπ Description
- Own data used by models; assess data scope and gaps.
- Build training datasets of proteins/molecules from diverse sources.
- Expand data beyond public databases using biological/chemical reasoning.
- Design benchmarks for biologically meaningful model capabilities.
- Plan data strategies with researchers; prioritize signals across modalities.
- Integrate diverse data across scales to create coherent training data.
π― Requirements
- PhD in computational biology/biophysics/chemistry or related; 2+ years postdoc/industry.
- Deep understanding of molecular interactions, protein structure, data.
- Experience with large-scale biological/molecular datasets: sourcing, cleaning, integrating, analyzing.
- Strong Python programming; building scalable data processing pipelines.
- Understand ML data needs: coverage, quality, balance, evaluation.
- Data construction as a research problem; think about signal and gaps.
π Benefits
- We encourage new ideas, creativity and contrarian thinking.
- Healthy feedback-focused environment to help you grow and receive input.
- You own your day-to-day management; hit milestones.
- Competitive salary and equity in a growing startup.
- Excellent medical, dental, and vision coverage.
Meet JobCopilot: Your Personal AI Job Hunter
Automatically Apply to Data Jobs. Just set your
preferences and Job Copilot will do the rest β finding, filtering, and applying while you focus on what matters.
Help us maintain the quality of jobs posted on Empllo!
Is this position not a remote job?
Let us know!