Description
Overview We're looking for experienced Grafana power users to design expert-level evaluation tasks that test whether AI agents can use Grafana the way a real professional does. Your domain expertise is what makes these tasks authentic. What You'll Do - Design realistic, multi-step Grafana workflows - dashboards, alerting rules, data source configuration, panel setup, cross-module operations - Perform each workflow yourself on a hosted Grafana instance to produce a reference trajectory - Write clear, specific task prompts with measurable outcomes that can be verified programmatically - Implement programmatic graders that check whether each instruction was completed correctly - Review AI agent attempts at your tasks, identify where and why they fail, and tag root causes - Calibrate task difficulty so tasks are challenging but solvable - iterating on prompts and constraints based on model performance Requirements - 2+ years of daily, professional Grafana experience (SRE, Platform Engineering, Observability, or similar) - Deep familiarity with PromQL, dashboard templating, alerting pipelines, and data source configuration (Prometheus, InfluxDB, etc.) - Ability to articulate workflows clearly enough for programmatic verification - Comfort writing basic grading scripts (Python; engineering support provided as needed) Nice to Have - Experience with Grafana API automation - Kubernetes/infrastructure monitoring background - Familiarity with AI evaluation or benchmarking Time Commitment - 10-15 hrs/week minimum during the project - Fast turnaround expected - responsiveness matters
Details
Category
Code Evaluation
Location
Remote
Employment Type
Independent Contractor
Languages Required
Posted
4/10/2026
Interested? Apply directly.
Apply Now →Related Opportunities
Is Mercor Legit?
How Much Do AI Jobs Pay?
How to Get Started