Tired of Manually Applying to Jobs?

Let JobCopilot do it for you!

Set your preferences and let your AI copilot handle the job search while you sleep.

Applies for jobs that actually match your skills

Tailors your resume and cover letter automatically

Works 24/7—so you don't have to

Architect and maintain the Reddit Benchmark evaluation suite across Safety and Reddit knowledge.
Build scalable SFT pipelines with distributed training for instruction tuning.
Develop Model-as-a-Judge: automated evaluation pipelines using strong models.
Execute synthetic data strategies to improve model generalization.
Collaborate with Safety Engineering to translate safety policies into tests in CI/CD.
Debug post-training instability; inspect loss curves and evaluation logs.

4+ years of professional ML engineering experience with LLM fine-tuning or evaluation.
Fluency in Python and PyTorch with Hugging Face Transformers, vLLM, or lm-eval-harness.
Deep understanding of Instruction Tuning (SFT) and data quality impact.
Experience building Evaluation Pipelines and domain-specific benchmarks.
Familiarity with distributed training (FSDP/DeepSpeed) for fine-tuning.
Strong data engineering skills for curating and cleaning instruction datasets.

Senior Research Engineer, Post-training & Evaluation

Meet JobCopilot: Your Personal AI Job Hunter