---
title: "A utility rate sheet is only half your bill"
date: 2026-06-28T00:00:00.000-04:00
author: "Chris Betz"
url: https://www.cbetz.com/blog/rate-sheet-is-half-your-bill
---

# A utility rate sheet is only half your bill

_A utility rate sheet prices only about half your bill. I re-parsed PECO's filed tariff with an LLM, checked it against a real bill, and an open engine reproduces the total to 0.0007% once it has the other half: generation, transmission, and riders._

I am building [Ratebook](https://github.com/cbetz/ratebook), an open dataset and engine for US residential electricity tariffs. The goal is narrow and concrete: given a plan and your usage, compute the bill, exactly, and show its parts. The first real end-to-end test was PECO’s Residential Service (Rate R), the default plan for most of Philadelphia. I parsed the filed tariff PDF with an LLM, ran the result through the engine, and compared it to an actual bill. The extraction worked on the first pass. The bill-match taught me the thing this project exists for: **a rate sheet is not a bill.** The filed sheet prices only about half of what you actually pay.

## What is on the sheet, and what the model pulled

PECO Rate R (Supplement 21, effective 2026-01-01) is a simple plan on paper. An extractor (Claude with structured output, reading the tariff PDF natively) returned the **distribution** component verbatim: `$0.10276/kWh`, flat, all kWh, plus an `$11.30/month`fixed charge, no tiers, no time-of-use. I checked every field against the PDF by hand. It was right, and it was also right about what it _refused_ to price: generation (“refer to the Generation Supply Adjustment”) and transmission are named on the sheet but priced in separate filings, and nine riders appear with no number on the page at all.

The part that earned its keep on day one was not the model. It was a deterministic validator that re-checks the extracted record before the engine ever sees it. It caught a bug, not in the extraction, but in our own converter: the PECO sheet lists two fixed charges, `$11.30`standard and `$2.19`for legacy “former off-peak” meters, that are **mutually exclusive**, and the first converter blindly summed them (it would have billed `$13.49`). A structured check found what a human skim missed. That is the whole thesis of building eval-first, in miniature.

## The finding: a rate sheet is not a bill

PECO Rate R, the actual filed tariff, prices the **distribution** component only. So I got a real bill to see how much that leaves out. The statement was for a 30-day period, **1,244 kWh**, total **$276.35**. Here is how it splits:

- **Electric Delivery **(distribution): $139.27, which is **50.4% **of the bill.
- **Electric Supply **(generation + transmission): $137.14, which is **49.6%**.
- Taxes and fees: -$0.06.
The filed rate sheet covers the **Delivery** half. The other half, just about dollar-for-dollar, lives in documents the sheet only points to: the generation Price to Compare (a separate PECO filing on a separate quarterly cadence, or your competitive supplier’s rate if you shop), the transmission service charge, and the riders. This is the “the app lied about my bill” failure mode made concrete on one real plan: price the sheet, call it the bill, and you are 50% low.

## Does the engine actually work? To a fraction of a cent

A finding about missing components is only useful if the engine is correct once it _has_ them. So I fed every line from the bill through it:

- **Distribution**: $0.10276/kWh, totaling $127.83, from the tariff PDF we extracted.
- **Customer charge**: $11.29 (the tariff PDF lists $11.30; the bill is 1 cent under).
- **DSIC rider**: $0.15, a separate filing.
- **Generation**: $0.10237/kWh, totaling $127.35, from the Price to Compare (a separate filing).
- **Transmission**: $0.00787/kWh, totaling $9.79 (separate).
- **State tax adjustment**: -$0.06 (separate).
The distribution rate the model pulled from the PDF reproduces the bill’s distribution line **exactly** (1,244 x 0.10276 = $127.83). Feeding all components to the engine over the 30-day, 1,244-kWh period yields **$276.352** against the actual **$276.35**, a **0.0007% error**, far inside the 2% bar I hold the project to. It is a committed test (`test_billmatch_peco.py`), built from rate and usage facts only, with no account number or address.

I want to be precise about what that proves. It proves the **engine** is correct: given a bill’s components, it reconstructs the total to the penny. It does **not** prove end-to-end extraction. The distribution component came from the extracted PDF, but the generation, transmission, and rider values came from the bill itself. Sourcing those from utility documents, so a bill can be reproduced _without_ already having the bill, is the next milestone, not a solved problem.

## Then why not just use the open dataset?

Most tools that touch US rates start from NREL’s [URDB](https://openei.org/wiki/Utility_Rate_Database), the Utility Rate Database: roughly **58,866 records**, public domain, genuinely valuable, the backbone of a lot of energy software. I am building on it too. But I loaded the June 2026 bulk snapshot and profiled it (every headline number re-derived by an adversarial checker pass), and on its own it cannot answer “what is my bill,” for structural reasons that the PECO case makes tangible:

- **It is bundled, not decomposed. **The `servicetype `field is missing on **71.6%**of records, so for most rates you cannot tell whether the number is delivery-only, supply-only, or all-in. URDB’s record for PECO Rate R carries a single `$0.21884/kWh`. In this snapshot that bundled figure is actually _close_to the true all-in rate, which is exactly the trap: it looks right, you cannot see that it is one stale blend of three components, and it drifts the moment PECO’s quarterly Price to Compare moves.
- **It is mostly old. 74% **of active residential rates carry `latest_update `timestamps from before 2016 (loaded in a 2015 bulk import and never revisited), yet with no end date they read as current. URDB actively maintains on the order of **150 utilities a year **against an EIA universe of about 3,000.
- **Riders live in prose. **About **27% **of active residential rates whose description mentions a rider, adjustment, fuel charge, or surcharge carry no structured value for it. Those are precisely the lines that break a 2% match.
- **Whole markets are missing. **URDB contains **zero **Texas retail (REP) plans, the prices that actually set most Texans’ bills.
None of this is a knock on URDB as an archive. It is a knock on treating any single bundled rate as a bill.

## What Ratebook is

The shape of the fix follows directly from the finding:

- **A schema that decomposes **a tariff into priced components, referenced-only components, and riders, so the engine prices what it has and stays explicit about what it does not. Refusal is a typed return value; “unknown” is a first-class answer, never a partial number dressed up as a total.
- **A deterministic engine**, `Decimal `end to end, with a Python implementation and a TypeScript port held to the same JSON test vectors, so a result is reproducible across languages and over time.
- **An open dataset **of 65 engine-validated tariffs across 37 utilities, dedicated to the public domain (CC0), with provenance and a confidence flag on every record, plus a Home Assistant integration for the people who want this running at home.
## Caveats, because this is easy to overstate

- The bill-match is **N=1**: one plan, one bill. It proves the engine, not the population. More bill-matches across more utilities are the next milestone.
- The 0.0007% used the bill’s own generation, transmission, and rider values; only distribution came from the extracted PDF. **End-to-end extraction is not done.**
- The URDB freshness numbers come from a **single June 2026 snapshot**. “Decaying” is defensible; I am not claiming the database is permanently frozen.
- The clean “half the bill” split is specific to **restructured states **that separate delivery from supply (Pennsylvania is one). In a vertically integrated state, one bundled tariff can be most of the bill. The lesson generalizes to “know which components your number includes,” not to a universal 50%.
Everything here is open and checkable: the dataset is CC0, the engine is Apache-2.0, and the bill-match is a committed test at [github.com/cbetz/ratebook](https://github.com/cbetz/ratebook). If you want your utility covered, tell me the utility and rate schedule and I will add it. The single most useful thing you can send is a real bill’s line items, rates and usage only, no account or address, for a plan I have not matched yet.
