AI Usage Funnel — overview

See exactly which skill spent the money — and what one minute of TTS really costs

Master summary — the gist in 30 seconds

TL;DRTurn the dashboard's AI Usage tab into a per-skill / per-category cost x-ray — tokens AND dollars — with unit prices like $/minute of /TTS audio.

Input: every AI call's real token counts + the price table you already have. Output: a tab that says 'Reply drafting spent $X, Enrichment $Y, TTS $Z (= $0.04 per audio minute)', so you instantly see what's expensive.

Why this mattersToday the money is a black box: one flat number, no idea which skill or task it came from. You can't optimise what you can't see. This makes spend legible per skill and per unit of work — the first step to cutting the costly ones.

flowchart LR
  A["Every AI call<br/>(app + laptop skills)"] --> B["usage.record<br/>tokens + cost + unit"]
  B --> C["keyed rows in<br/>the store"]
  C --> D["AI Usage tab<br/>per skill / per category"]
  D --> E["$/min TTS<br/>$/lead · $/call"]

1 · Where we are now — half of it already works

TL;DREvery AI call inside the Modal app is ALREADY logged with real token counts. The plumbing exists.

Input: any Claude/Gemini call in the app. It all flows through one client (extract.AI), which already records tokens x price into usage.py. Output: a flat 'AI Usage' table (task, calls, tokens, cost) on the dash today.

Why it mattersThis means we are extending a working pipe, not laying a new one — far less risk. The cheap, keyed-read design (no slow scans) is already in place and must be preserved.

flowchart TD
  X["draft reply · extract · enrich · eval"] --> AI["extract.AI client<br/>(one funnel)"]
  AI --> L["_log -> usage.record"]
  L --> S["store: usage:date:promptkey"]
  S --> T["dash AI Usage table"]

2 · The five gaps to close

TL;DRFive precise holes stop it from showing per-skill, per-category, per-unit truth.

Input: today's flat per-task log. Gaps: (1) no category grouping, (2) some calls log as 'unknown', (3) no 'unit' field, (4) laptop skills (/TTS, finish-session-tts) log nothing, (5) the TTS model isn't even in the price table. Output once fixed: a real per-skill/category + unit-economics view.

Why it mattersEach gap maps to one small, surgical edit. Naming them up front keeps the build tight and stops scope creep — this is an extend job, not a rebuild.

mindmap
  root((5 gaps))
    No category dimension
    'unknown' holes
    No unit field
    Laptop skills unlogged
    TTS price missing

3 · The laptop bridge — how /TTS gets its cost to the dashboard

TL;DRSkills run on your laptop; the dash runs on Modal. A tiny POST endpoint bridges them.

Input: /TTS finishes and already knows the audio's duration in seconds + the model. Output: it POSTs {model, tokens/seconds, unit} to a small token-guarded /dash/api/usage/ingest endpoint, which writes it into the same store as the app's calls. One shared list, two sources.

Why it mattersThis is the only genuinely new piece, and it's deliberately thin (one endpoint + one helper the skills share). It fails open — if the POST fails, /TTS still works. Instance 2 nails the exact auth token + helper.

flowchart LR
  TTS["/TTS on laptop<br/>knows duration_s"] -->|POST + dash token| IN["/dash/api/usage/ingest"]
  IN --> S["same store"]
  APP["app AI calls"] --> S
  S --> TAB["AI Usage tab"]

4 · Unit economics — the part you actually asked for

TL;DRLog the 'unit' beside each call, then dollars / units = the price you care about.

Input: each call logs its unit — audio seconds for TTS, lead count for enrich, a transcript for call-parse. Output: $/minute of TTS = total TTS cost / (total seconds / 60); same shape for $/lead and $/call.

Why it mattersA raw dollar total tells you little; '$0.04 per audio minute' or '$0.012 per lead' is a decision you can act on. This is the metric that lets you compare skills fairly and spot the expensive ones.

flowchart TD
  C["TTS cost (sum $)"] --> R["divide"]
  U["audio seconds / 60"] --> R
  R --> P["$ per minute of audio"]

5 · One key, instrumented split — and a periodic honesty check

TL;DRKeep ONE Gemini key. The per-skill split comes from internal labels, not separate keys. Reconcile against the real console now and then.

Input: one API key + our own labels (category, unit) on every call. Output: an instrumented estimate that IS the live truth on the tab. Periodically you open the Gemini/Anthropic billing console (via Chrome DevTools) and check our total against theirs to keep it honest.

Why it mattersSplitting by keys would be fragile and limited; labelling each call is flexible and infinitely granular. The console is the anchor of truth checked monthly, not a live feed — so the dashboard stays fast and keyed-read-only.

flowchart LR
  K["ONE Gemini key"] --> I["label every call<br/>category + unit"]
  I --> E["instrumented estimate<br/>(LIVE truth)"]
  E -. monthly compare .-> B["billing console<br/>(reconcile anchor)"]

6 · What happens next in the chain

TL;DRInstance 2 turns this into a concrete checklist + ADRs; Instance 3 builds + verifies on real pixels.

Input: this handoff (intent locked, gaps named, defaults proposed). Output: Instance 2 researches + writes checklist.md + ADRs for the 7 deferred technical choices; Instance 3 executes and QA-verifies the new tab, sentinel-gated.

Why it mattersIntent is fully locked here — no open questions that change WHAT we build. The only remaining decisions are HOW (ingest auth, category map, TTS price formula), which are safe to settle with code research. That's why the plan can move without stopping to ask you again.

timeline
  title Planning chain
  Instance 1 : Lock intent (this doc)
  Instance 2 : Research + checklist + ADRs
  Instance 3 : Build + pixel-verify the tab

Full technical handoff (HANDOFF.md) →Locked intent (intent.json) →Prompt for Instance 2 →