---
title: "taskproof"
date: 2026-06-17T00:00:00.000-04:00
tags: ["typescript", "python", "playwright", "claude"]
url: https://www.cbetz.com/portfolio/taskproof
---

# taskproof

_CI harness that tests whether real AI agents can complete tasks on your site._

taskproof checks whether AI agents — not just humans — can actually use your website. You describe tasks in YAML as a natural-language goal plus deterministic success assertions, and taskproof drives multiple agent harnesses (Claude computer-use, browser-use) through them in parallel, grading with pass@k to tolerate non-determinism.

It renders an interactive HTML report with per-step screenshots, cost breakdowns, and baseline diffs, so CI catches agent-usability regressions the way it catches broken tests.
