PhD Rater

AI Evaluation$70 - $120 per hour

仕事内容

Deep expertise in data science, machine learning, finance, and/or Python-based coding
Active or recently graduated PhD (Top U.S.-based school)
Strong research background in frontier STEM topics
Ability to engage reliably for 30+ hours/week, primarily on weekdays
Demonstrated technical output such as high-quality open-source contributions (especially in agentic / LLM tooling ecosystems)
Comfort reading and reasoning about agent behavior traces to diagnose failure modes beyond surface-level errors

Initial focus area: agentic workflows for STEM tasks
Familiarity with agentic frameworks and OSS ecosystems is helpful (examples include LangChain, MetaGPT, AutoGen, AutoGPT, CrewAI, LlamaIndex, BabyAGI, SuperAGI, CAMEL, AgentGPT, Dify, etc.)
Deliverables are expected to be reproducible and testable (clear specs, deterministic tests where possible, documented environments)

Our investors include Benchmark, General Catalyst, Adam D’Angelo, Larry Summers, and Jack Dorsey.
Thousands of professionals across domains like law, creatives, engineering, and research have joined to work on frontier projects shaping the next era of AI.