Skip to content

Data Architecture

Data Architecture

Looking for the column-level catalog? See Tables, Functions, Triggers. This page is the architecture narrative — those three catalogs are the source-of-truth mirror of supabase/migrations/*.sql.

CHANGE SUMMARY

  • Updated: Reflected the live managed AI usage writer (anthropic-proxy Edge Function + ai_usage_daily) and how rate limits are enforced (supabase/functions/anthropic-proxy/index.ts, supabase/migrations/0004_ai_usage_tracking.sql, 0005_upsert_ai_usage_fn.sql).
  • Updated: Clarified how BYOK vs. managed paths influence payload destinations and risks (src/api.js, src/supabase.js).
  • Updated (2026-04-29, SUR-92): Added full per-user quota model — user_profiles.month_limit (default 50), allocation_override, getResolvedMonthlyLimit. getMonthlyUsage sums across all action types. Trigger name confirmed as trg_ai_usage_month_delta (function: apply_ai_usage_month_delta). emitTelemetry is fire-and-forget via fireAndForget().
  • Updated (2026-05-02, SUR-233): Added wrapped_key_blobs entity and device_label column; documented E2EE key-wrapper table and how device labels are stored at enrolment time.
  • Updated (2026-05-02, SUR-237): wrapped_key_blobs.created_at now gets a server-side DEFAULT ((extract(epoch from now()) * 1000)::bigint) via migration 0014, removing the client-clock dependency. Added select_fresh_transfer_blob(p_user_id, p_max_age_ms) SECURITY DEFINER RPC — the authoritative TTL gate for transfer-v1 blobs; redeemDeviceTransfer calls this instead of the old client-side age check.
  • Updated (2026-05-07, SUR-85): Stripe billing infrastructure added in migration 0019_stripe_billing.sql. Four additive columns on user_profiles (stripe_customer_id, stripe_subscription_id, subscription_status, subscription_current_period_end) and a new stripe_webhook_events idempotency ledger keyed on Stripe’s event.id. The lapsed-Pro 30-day grace window is enforced at read time by getResolvedEntitlements rather than via a daily cron flip — user_tier stays 'pro' after subscription.deleted and the resolver returns 'free' only once subscription_current_period_end + 30 days has passed. See Stripe billing below.
  • Updated (2026-05-12, SUR-358 cluster / SUR-368): Access control relaxed for the open-signup cutover. Migration 0021_drop_waitlist_rls_gate.sql (SUR-363) dropped the waitlist-EXISTS predicate that migration 0007 had layered onto the RLS policies for books, notes, custom_ideas, and the note-images storage bucket. RLS is now ownership-only — auth.uid() = user_id is the sole gate — restoring the 0001 contract. The waitlist gate was redundant the moment SUR-362’s handle_new_auth_user() trigger (migration 0020) stopped requiring an approved waitlist_requests row to materialise a user_profiles row; without 0021 a direct signup would hold a valid JWT and a profile row but silently 403 every data read. The migrations co-deploy (same PR / branch) so the regression window between “trigger no longer back-links” and “RLS no longer requires the back-link” never opens in production. See Access control / RLS below for the consolidated picture. Rollback procedure documented in docs/runbooks/open-signup-rollback.md (SUR-368).
  • Updated (2026-06-03, SUR-527): Device-local embeddings store (Dexie v13). The first table that is local-only by construction — never enqueued, absent from flushOutbox/fetchAllCloud, and excluded from the JSON export. Stores per-capture vectors (seeded on-device for future semantic search, SUR-157) quantized to int8 and encrypted at rest with the Master Key (src/crypto/byteEncryption.js, AAD-bound to noteId). Generation is dark-launched behind VITE_ON_DEVICE_EMBEDDINGS and gated on passkey unlock; a per-device Settings control governs the model download. SCHEMA_VERSION → 13. The companion lexical Search view (/search) indexes the already-decrypted in-memory notes (MiniSearch), not this store.
  • Updated (2026-05-30, SUR-316): Prompt versioning v1. Two new service-role-only tables: prompts (migration 0027, seeded 0028) holds the versioned managed-AI system prompts the anthropic-proxy Edge Function loads per call (one active row per name, fail-open to code constants); ai_usage_events (migration 0029) is an append-only per-call log tagging each managed call with prompt_name + prompt_version. ai_usage_daily and upsert_ai_usage are unchanged — quota accounting is untouched; ai_usage_events is a parallel analytics sink written best-effort via the new insert_ai_usage_event RPC. Migration 0030 revokes EXECUTE on upsert_ai_usage from anon/authenticated (pre-existing gap). See Prompt versioning below and the catalog entries: Tables → prompts, ai_usage_events, Functions → insert_ai_usage_event.
  • Updated (2026-06-16, SUR-605): Synced lexicon tables documented (SUR-598 backfill). Three new synced, per-user, RLS-own-rows tables: lenses (Dexie v15, migration 0042) and collections + collection_memberships (Dexie v16, migration 0043), each with SUR-499 explicit REVOKE/GRANT. Captures the load-bearing invariant — lenses and collections resolve at read time, never stored as nodes or edges — plus the offline-first sync/tombstone/cascade lifecycle and export/import coverage. SCHEMA_VERSION16. See Lenses & collections below and the catalog entries: Tables → lenses, collections, collection_memberships.
  • Updated (2026-06-20, SUR-580): Device-local discovery_jobs store (Dexie v17) — bulk idea discovery. The second table that is local-only by construction (after embeddings): never enqueued, absent from flushOutbox/fetchAllCloud, excluded from the export, and cleared by importReplace + the full wipe so a restore can’t strand a job on reused note ids. It holds the orchestration state (per-source batches, cursor, retry/pause status, applied-idea roles, counts) for the resumable bulk idea-discovery engine — distinct from the per-record outbox, which collapses-and-deletes on flush and so can’t carry durable job progress. The results (the discovered tags) live on note.tags, which does sync; the job rows never do. No new Postgres table or migration — the only server surface is PR-1’s batched discover_batch proxy action. SCHEMA_VERSION17. See Bulk idea discovery below.

All statements are backed by repository code/config as of 2026-06-20 (SUR-580 device-local discovery-jobs engine; SUR-605 synced-lexicon backfill; SUR-198 cover-art schema).

Canonical entities

EntityLocal representationCloud representationNotes
BookDexie books table with id, title, author, isbn, coverUrl, coverSource, coverResolvedAt, timestamps, deleted flag (src/db.js)Supabase books table with same fields + user_id (FK to auth.users) (supabase/migrations/0001_initial_schema.sql, 0025_sur198_book_covers.sql)Deleting a book cascades soft-deletes to related notes locally before syncing. Cover-art + isbn fields added in SUR-198 (Dexie v10 / migration 0025); resolved client-side from Open Library, last-write-wins like the rest of the row. See Tables → books.
NoteDexie notes table containing text, page, tags[], imagePath, imageDataUrl, source, sourceId, sourceMeta, timestamps, deleted (src/db.js)Supabase notes table storing text, page, tags jsonb, image_path, source, source_id, source_meta, timestamps, deleted (supabase/migrations/0001_initial_schema.sql, 0002_notes_add_source.sql, 0003_notes_add_source_ingest.sql)Dexie keeps imageDataUrl for offline previews; Supabase stores only the storage path. Tags have a multiEntry index locally.
Custom ideaDexie customIdeas with name, description, timestamps, deletedSupabase custom_ideas with same shape + user_id (supabase/migrations/0001_initial_schema.sql)Custom ideas influence AI prompts; renaming cascades to notes via useSettings.
LensDexie lenses table { id, name, leafIds[], combinator: 'AND', createdAt, updatedAt, deleted } (Dexie v15, src/db.js)Supabase lenses table — same fields snake-cased (leaf_ids text[], combinator) + user_id (FK to auth.users) (supabase/migrations/0042_lenses.sql)A user-minted lens: a saved AND-intersection of canon ideas — definition only (name + leafIds), never a node or an edge. Canonical lenses are static code (src/lib/canonicalLenses.js), never synced; the canon: id prefix is reserved (guarded in saveLens) so only user rows persist. Plaintext (no encryption seam). SUR-499 explicit REVOKE/GRANT. See Lenses & collections and Tables → lenses.
CollectionDexie collections table { id, name, createdAt, updatedAt, deleted } (Dexie v16, src/db.js)Supabase collections table + user_id (supabase/migrations/0043_collections.sql)A manual grouping of notes — name only, no note content (plaintext). Deleting a collection cascades soft-deletes to its membership rows (single Dexie transaction). SUR-499 explicit REVOKE/GRANT. See Lenses & collections and Tables → collections.
Collection membershipDexie collection_memberships table { id, noteId, collectionId, createdAt, updatedAt, deleted } (Dexie v16, src/db.js)Supabase collection_memberships join table + user_id; FKs note_id → notes(id) and collection_id → collections(id), both ON DELETE CASCADE (supabase/migrations/0043_collections.sql)A note↔collection query-time join row — not a graph edge and not a node (fabricates no co-occurrence). Deterministic id <collectionId>:<noteId> (membershipId) makes add/remove idempotent and a note at-most-once per collection. Note- and collection-deletes cascade membership soft-deletes. See Lenses & collections and Tables → collection_memberships.
MetaDexie meta table storing apiKey, lastSyncAt (src/db.js)NoneData never leaves the device; API key is local-only (managed users can omit it).
OutboxDexie outbox table of {table, recordId, payload, createdAt} (src/db.js)Not persisted in cloudFlushed via flushOutbox and deleted once Supabase confirms writes.
Embedding (device-local)Dexie embeddings table { noteId, modelVersion, dim, sourceHash, vector, updatedAt } (Dexie v13, src/db.js; sealed/read via src/embeddings/embeddingStore.js)None — never syncedThe first local-only-by-construction table: not enqueued, absent from flushOutbox/fetchAllCloud, excluded from the export. Per-capture int8 vector encrypted at rest with the Master Key (AAD = noteId, src/crypto/byteEncryption.js), seeded on-device (MiniLM via Transformers.js) for future semantic search (SUR-157). Cleared on importReplace + full wipe so vectors aren’t stranded on reused note ids. Dark-launched behind VITE_ON_DEVICE_EMBEDDINGS; generation gated on passkey unlock.
Discovery job (device-local)Dexie discovery_jobs table { id, status, origin, maxIdeas, createdAt, updatedAt, batches[], cursor, preview, appliedDetail, counts } (Dexie v17, src/db.js; helpers saveJob/getJob/getActiveJobs)None — never syncedThe second local-only-by-construction table (SUR-580). Holds the orchestration of a bulk idea-discovery run — per-source batches of note ids, a cursor, retry/pause status, the stored first-batch preview, the role-tagged appliedDetail (for undo), and run counts. Not the outbox (which collapses-and-deletes on flush — no place for durable job progress). Plaintext orchestration metadata, no note content. Not enqueued, absent from flushOutbox/fetchAllCloud, excluded from the export; cleared on importReplace + full wipe so a job can’t strand on reused note ids. A static guard test asserts the table name never appears in src/supabase.js. The run’s output is note.tags, which does sync. See Bulk idea discovery.
AI usageNone (client)Supabase ai_usage_daily table (supabase/migrations/0004_ai_usage_tracking.sql)Tracks managed Anthropic usage only. BYOK calls never appear here. Written exclusively by the Edge Function via rpc('upsert_ai_usage', …); never synced to the client. See AI Usage Tracking.
Managed promptNone (client)Supabase prompts table (supabase/migrations/0027_sur316_prompts_table.sql, seeded 0028)Versioned managed-AI system prompts loaded per call by anthropic-proxy. Service-role-only; never reaches the client. One active row per name. See Prompt versioning.
AI usage eventNone (client)Supabase ai_usage_events table (supabase/migrations/0029_sur316_ai_usage_events.sql)Append-only per-call log tagging each managed call with prompt_name + prompt_version. Best-effort analytics sink (does not gate quota); written via rpc('insert_ai_usage_event', …). SUR-316.
E2EE key wrapperNone (client)Supabase wrapped_key_blobs tableOne row per enrolled device. Columns include: id (uuid), user_id (FK), type (prf-v1 | transfer-v1 | pin-v1), blob (base64 encrypted MK), is_active (bool), device_label (text — "Platform · Browser" from getDeviceLabel(), set at enrolment, never updated), created_at (bigint Unix ms — server-stamped via DEFAULT ((extract(epoch from now()) * 1000)::bigint) since migration 0014; client code no longer supplies this for new inserts). Server-side only — not part of the Dexie sync model. Managed entirely through useKeyManagement + src/supabase.js. Transfer-v1 freshness enforced by the select_fresh_transfer_blob RPC rather than client-side age checks.

Schema evolution & parity

  • Dexie evolves through ten versions, introducing updatedAt, soft deletes, imagePath, source, sourceId, sourceMeta, and a multiEntry tags index (v2–v6), then passkey/encryption meta (v7), Master Key meta (v8), note chapter (v9), and book cover-art + isbn fields (v10, SUR-198) (src/db.js).
  • Supabase schema mirrors these additions via the migration set in supabase/migrations/*.sql plus the declarative scripts/schema-contract.js; npm run check:schema executes scripts/check-schema.js to compare a live database to that contract.
  • SCHEMA_VERSION now equals 17 (v11 SUR-505 tag rename, v12 SUR-84 note_links, v13 SUR-527 device-local embeddings, v14 SUR-597 contemporary-canon swap, v15 SUR-598 lenses, v16 SUR-598 collections + collection_memberships, v17 SUR-580 device-local discovery_jobs), and src/test/export-import.test.js fails if someone bumps the Dexie version without updating exports, so backups retain provenance/tag/cover fields and the lexicon tables (lenses, collections, collection_memberships all serialize). The v13 embeddings and v17 discovery_jobs tables are intentionally excluded from the export (derived / device-local orchestration; a regression test asserts buildExport omits each).
  • notes_source_id_idx is intentionally non-unique to avoid rejecting duplicate highlights during cross-device imports (see comments in supabase/migrations/0003_notes_add_source_ingest.sql).

Data flows

flowchart LR
Capture[Capture / Manual Input] --> Ingest[ingest adapters\nsrc/ingest/*.js]
Ingest --> NoteForm[useNoteForm\nsrc/hooks/useNoteForm.js]
NoteForm --> Dexie[Dexie tables\nsrc/db.js]
NoteForm --> Outbox[Outbox queue]
Dexie --> Export[Export JSON\nbuildExport()]
Dexie --> LoadAll[loadAll() -> UI]
Outbox --> SupabaseUpsert[cloudWrite/flushOutbox\nsrc/hooks/useAuth.js]
SupabaseUpsert --> SupabaseDB[Supabase tables\nsupabase/migrations/*.sql]
SupabaseDB --> Merge[mergeCloudRecords]
Merge --> Dexie
SupabaseDB --> Storage[Storage bucket note-images\nsrc/supabase.js]

Lifecycle details

  • Create/update: Hooks call saveBook, saveNote, or saveCustomIdea, which stamp updatedAt and deleted: 0. Immediately afterwards, cloudWrite attempts an upsert; offline writes enqueue payload copies in outbox with recordId to enable collapse (src/db.js, src/hooks/useAuth.js).
  • Delete: Books and notes use soft deletes (deleted: 1); book deletes also soft-delete child notes inside a Dexie transaction. Hard deletes exist only for note cleanup (hardDeleteNote). When syncing, Supabase rows carry deleted flags as booleans.
  • Images: Captured images are compressed and stored inline (imageDataUrl); if Supabase upload succeeds, imagePath is saved via setNoteImagePath, but the original base64 blob stays local for offline lightbox (src/hooks/useNoteForm.js). Sync download fills missing previews using Supabase Storage (src/hooks/useAuth.js).
  • Tags & provenance: AI-discovered tags populate note.tags; provenance is stored via source, sourceId, and sourceMeta. Dexie maintains a multiEntry index on tags for efficient idea filtering. Supabase stores tags as jsonb.
  • Import/export: buildExport packages books/notes/ideas — plus note_links, lenses, collections, and collection_memberships — with the _syntopicon flag; parseImport validates input (the lexicon tables default to [] so a pre-v15/v16 export imports cleanly). importMerge inserts only new IDs, while importReplace wipes and restores tables — and explicitly clears all three lexicon tables before restoring, or a replace would leave the prior library’s saved lenses/collections shadowing the incoming set (the SUR-84 regression class: a synced table missing from importReplace silently survives a restore). The device-local embeddings and discovery_jobs tables are excluded from the export but are cleared by importReplace and the full local wipe (SUR-527 / SUR-580), so a replace can’t strand vectors or a job on reused note ids. Settings UI exposes merge vs. replace flows (src/hooks/useSettings.js).
  • Outbox collapse: collapseOutboxItems merges queued payloads by (table, recordId), enforcing delete stickiness and chronological ordering before Supabase writes (src/supabase.js, tests in src/test/outbox.test.js).

Lenses & collections — the synced lexicon (SUR-598)

SUR-598 added a lexicon layer on top of the idea graph: lenses (saved AND-intersections of canon ideas) and collections (manual groupings of notes), backed by three new synced tables. lenses landed in PR1 (Dexie v15, migration 0042_lenses.sql); collections + collection_memberships in PR2 (Dexie v16, migration 0043_collections.sql); PR3a (useLenses/useCollections) added the create/rename/delete UI but no new table or mechanism. All three are per-user, RLS own-rows, plaintext (no encryption seam — they hold definitions and references, never note content).

Load-bearing invariant — read-time resolution, never materialised. A lens and a collection are resolved at read time and are never stored as nodes or edges in the idea graph. A lens persists only { name, leafIds, combinator: 'AND' }; the set of notes it matches is computed on demand from the live note set, never written back. A collection’s membership is a real row, but it is a query-time join, not a graph edge: it fabricates no co-occurrence between the notes it groups. This is the deliberate contrast with note_links (SUR-84, migration 0034), which is a first-class graph edge with a relationType. Conflating the two — materialising a lens/collection as edges — would inject phantom co-occurrence into idea analytics, so the invariant is load-bearing.

Canonical vs. user lenses. The six canonical lenses are static code (src/lib/canonicalLenses.js, Object.freezed, source: 'canon') and are never synced — they ship with the app. Only user-minted lenses become lenses rows. The canon: id prefix is reserved: saveLens throws if a user lens id starts with canon:, so a synced row can never shadow a canonical one.

Membership identity. A membership id is deterministicmembershipId(collectionId, noteId) returns `${collectionId}:${noteId}` (src/db.js). Because note/collection ids are colon-free uid() values, this is unambiguous and makes a note at most once per collection; add/remove are idempotent and converge across devices without duplicate rows (a dev-only guard logs if a : ever appears in either component).

Sync model these ride. Identical to the rest of the offline-first stack:

  • Write: hooks stamp updatedAt/deleted and enqueue payloads in outbox; flushOutbox routes lensesupsertLens, collectionsupsertCollection, collection_membershipsupsertCollectionMembership (src/supabase.js), all plaintext .upsert()s.
  • Read/merge: fetchAllCloud pulls each table via fetchOptionalTable (a resilient fetch — a client syncing before the migration is applied degrades to “none” instead of bricking the whole sync), and mergeCloudRecords applies last-write-wins by updated_at (src/db.js).
  • Delete (tombstones): soft-delete only. Local state holds live rows; the cloud keeps the tombstone so a delete converges across devices. The RLS policies deliberately omit an AND deleted = false filter precisely so an owner can still pull their own tombstones via fetchSince.
  • Cascade: deleting a collection soft-deletes its memberships (deleteCollectionsoftDeleteMembershipRows, one Dexie transaction); deleting a note soft-deletes its memberships (softDeleteMembershipsForNote, called from useNoteActions). At the Postgres layer the membership FKs are ON DELETE CASCADE — the hard-delete backstop behind the app’s soft-delete.

Export / import coverage. All three tables are serialized by buildExport and round-tripped by parseImport/importMerge. Critically, importReplace clears all three before restoring — omitting any one would let the prior library’s lenses/collections silently survive a “replace” restore (the SUR-84 export-path regression class). Pre-v15/v16 exports import cleanly because each table defaults to [].

SUR-499 grants. Each table follows the explicit-grant idiom (REVOKE ALL from public/anon/authenticated, then GRANT SELECT, INSERT, UPDATE, DELETE to authenticated and GRANT ALL to service_role) — ahead of the Oct 30 2026 Data-API cutover.

Column-level shapes, indexes, and RLS predicates are catalogued at Tables → lenses, collections, collection_memberships. The originating decisions are tracked in surfc/RISKS_GAPS_ASSUMPTIONS.md (the SUR-598 entry); this page closes the canonical-documentation gap that entry flagged.

Incremental pull — server watermark + LWW guard (SUR-739 / SUR-740 / SUR-652)

The offline-first model above (last-write-wins by client updated_at, soft-delete tombstones) is unchanged. What changes is the transport around it — three defects closed server-side + in the pull cursor:

  • SUR-740 — server LWW guard. A BEFORE UPDATE trigger (t01_lww_guard) on all 8 synced tables rejects an UPDATE whose updated_at is strictly older than the stored row. updated_at stays the client’s edit time (LWW is still client-clock — deliberately not a server-time trigger); equality passes. This stops a delayed / replayed / concurrent flush from clobbering a newer server row — the server-side complement to braird-core’s client-side outbox rebase (SUR-736).
  • SUR-739 — server visibility watermark. Each synced table gains a server-assigned, monotonic change_seq bigint, stamped by t02_change_seq on every insert/update (it fires after the guard, so a rejected write doesn’t bump it). It is distinct from updated_at: a row becomes visible at flush, not at enqueue, so a cursor over change_seq delivers every row the server holds — closing the delayed/offline-flush hole where a row stamped before a puller’s updated_at cursor advanced was never fetched. change_seq is server-only (never in an upsert payload, so the synced column set is unchanged).
  • SUR-652 — keyset pagination + per-table cursors. fetchSince(table, sinceSeq) now pages by change_seq (.gt(cursor).order('change_seq').limit(1000) until a short page), and syncFromCloud keeps a per-table cursor (meta key lastSeq:<table>) instead of one global lastSyncAt. Exclusive .gt is exact — the watermark is unique + monotonic, so no boundary row is missed or duplicated (the old inclusive updated_at >= cursor both re-pulled the boundary row and silently capped at max_rows, skipping the overflow forever). The legacy global lastSyncAt is kept + still written, purely as a rollback fallback; an absent per-table cursor triggers a one-time cold-start full re-pull (which also recovers rows the old cursor holes — including the fetchOptionalTable missing-table skip — had dropped).

Client / core split + rollback. The migrations + PWA cursor land together in surfc/; braird-core consumes the same change_seq watermark in its own pull cursor as a follow-up. The migrations are additive + backward-compatible (an un-upgraded client keeps working on the updated_at path; the guard passes its always-fresh writes), so they are rollback-safe. Accepted residual: two devices writing different values to the same row in the same millisecond still don’t converge (strict-> LWW keeps local; NTP-bounded, pathological) — a deterministic tie-break would be wire-visible and is out of scope.

Bulk idea discovery — the device-local job engine (SUR-580)

A fresh import (e.g. 1,000+ Readwise highlights) lands a wall of orphan notes — note.tags === []. The only tagging path before SUR-580 was per-note discovery, a 1,000-call cost bomb against a 50-call/month Free cap. SUR-580 makes tagging the backlog feasible by batching the managed-AI call and driving it from a resumable, device-local background job. It splits across three PRs: PR-1 (GCE) added the batched discover_batch proxy action — up to 25 notes per Anthropic call (SUR-582), one {core, supporting} result per note, eval-gated on the core-anywhere bar; PR-2 (CE, this section) is the client job engine; PR-3 wires the activation surfaces (import prompt, Commonplace chip, Settings action).

Pure engine / thin driver. Every state decision lives in a pure, unit-tested module (src/lib/bulkDiscovery.js): snapshotOrphans, groupAndPack (the SUR-580 packing rule — group by source, chunk a source > B, greedily pack chunks into ≤ B-note batches, never splitting a chunk), computeAppliedTags, applyResults, processBatch, and the runner reducers (markInFlight / onBatchError / finalizeJob / resumeJob). The React surface (src/hooks/useBulkDiscovery.js) owns only the imperative glue: persistence between batches, the sequential loop, the single-job guard, resume-on-mount, and telemetry.

discovery_jobs is orchestration state, NOT the outbox. The outbox is a per-record cloud-sync queue that collapses and deletes rows on flush — there is nowhere to keep batch progress / retry / resume. So the engine adds a dedicated Dexie store (v17) that tracks the orchestration; the resulting tag writes still flow through the existing editNoteoutbox path for persistence + sync. A job row is { id, status, origin, maxIdeas, batches:[{ batchId, noteIds[], status, attempts }], cursor, preview, appliedDetail:{ [noteId]:{ added:[{name, role}], preserved[] } }, counts }. The applied-idea role (core vs supporting) lives in appliedDetailnever in note.tags, which stays a flat string[].

Device-local, never synced — by construction. discovery_jobs is the second table in this class (after embeddings): a job references note ids by batch, so it would strand on another device’s library; the run’s output (the tags) syncs through note.tags, the job rows never do. It is omitted from flushOutbox / fetchAllCloud / buildExport / parseImport / importMerge, and cleared by importReplace + the full wipe (the SUR-84 regression class — a device-local table missing from importReplace would let a stale job survive a restore on reused ids). The never-sync guarantee is enforced negatively by a static test (discovery-jobs-sync-exclusion.test.js) that fails if the literal discovery_jobs ever appears in src/supabase.js.

The apply flow — preview gate, then auto-apply with undo. core-anywhere ≈ 0.83 on the realistic arm means applying the whole emitted small set puts the right idea in the user’s tags ~5-in-6 times; the bar (founder, 2026-06-19) is to auto-apply — no per-note approval wall — bounded by a visible undo. So:

  • Preview first. startJob snapshots orphans, packs them, persists the row in preview status, and runs only the first source’s batch — fetched and shown, not applied — so the user sees real proposed tags on their own library before anything commits.
  • Auto-apply on confirm. confirmPreview applies that stored first batch (no second call), then the runner drains the rest, auto-applying each batch’s emitted set as it completes. Apply reuses the single-note recipe (computeAppliedTags mirrors useNoteActions.rediscoverIdeas): preserve the note’s non-canonical tags, merge core + supporting, dedup, and persist via editNote (the same encrypt-write + outbox-sync + state primitive — the engine receives it rather than re-implementing persistence).
  • Undo is the safety net. Per-note revert (to the pre-job preserved tags) and per-idea removal, both via editNote. The core-vs-supporting undo rate (bulk_discovery_idea_undone{role}) is the post-ship signal that validates the auto-apply bar and decides any fast-follow.

Resumable, idempotent, quota-aware. Each batch persists, so a crash or navigation resumes from disk (the mount effect rewinds an in_flight batch to pending via resumeJob). livePending re-reads each note and idempotently skips any already tagged — by a prior partial run, a cross-device sync, or a concurrent single-note re-discover — so re-running never double-tags. A transient error retries ≤ 3× then marks the batch failed and advances the cursor (a poison batch never blocks the backlog); a 429 pauses the job (resumable next month / on upgrade) rather than failing it. A paused run is resumed by an explicit continuePaused action (a runner-paused job resumes its drain; a job paused before its first preview re-enters the preview gate) — auto-resuming on mount would just re-hit the cap.

PII & guardrails. Each note’s text is PII-redacted on-device before it leaves the client (runSafetyChecks / redactMatches, the non-interactive bulk analogue of the single-note review sheet); the server-side Azure prompt-injection shield (PR-1) screens each note individually and drops a poisoned one without sinking the batch. The batched call counts as a single request against the existing monthly cap — no new meter.

The originating decisions are tracked in surfc/RISKS_GAPS_ASSUMPTIONS.md (the SUR-580 PR-1 entry); the client engine and its job table are documented here.

AI Usage Tracking

ai_usage_daily (supabase/migrations/0004_ai_usage_tracking.sql) is the authoritative store for per-user managed Anthropic usage. It is server-side only and is not part of the offline sync model.

Table: ai_usage_daily

ColumnTypeNotes
iduuidPK, gen_random_uuid()
user_iduuidFK -> auth.users, NOT NULL
action_typetextConstrained to 'transcribe' or 'discover'
window_daydateUTC calendar date of the request
request_countintegerNumber of successful requests in this window
input_tokensbigintAccumulated input tokens from Anthropic responses
output_tokensbigintAccumulated output tokens from Anthropic responses
created_atbigintUnix ms — set on first upsert for this window
updated_atbigintUnix ms — refreshed on every subsequent upsert

No deleted column: this table is not part of the sync model and rows are never removed by client code.

Tracking unit

Both request count and token counts (input and output separately) are recorded per row. This supports simple request-count quotas today and token-based billing in future paid tiers without a schema change.

Reset boundaries

  • Daily window: the window_day column (UTC calendar date).
  • Monthly window: derived in query — no separate column required:
    SELECT date_trunc('month', window_day AT TIME ZONE 'UTC') AS month,
    SUM(request_count), SUM(input_tokens), SUM(output_tokens)
    FROM ai_usage_daily
    WHERE user_id = $1
    GROUP BY 1 ORDER BY 1;

Write path

Rows are written exclusively by the anthropic-proxy Supabase Edge Function using the service role key. After a successful Anthropic response, recordUsage calls rpc('upsert_ai_usage', …) with (user_id, action_type, window_day) so the function can increment request_count and accumulate token totals atomically (supabase/functions/anthropic-proxy/index.ts, supabase/migrations/0005_upsert_ai_usage_fn.sql). No client-side writes occur, and BYOK calls never invoke this path.

The function enforces a per-user monthly limit (SUR-92) by summing request_count across all action types for the current month (getMonthlyUsage applies no action_type filter — transcribe and discover draw from the same shared cap) and comparing the total to the value resolved by getResolvedMonthlyLimit — see Per-user quota — user_profiles below for the SSoT model. On quota overflow the function returns HTTP 429 and emits a managed_ai_rate_limit_hit PostHog event via fireAndForget() (non-blocking — does not pad the 429 response with PostHog network latency). On Supabase SELECT/RPC errors it throws and returns HTTP 500, so quota enforcement never fails open. A missing user_profiles row returns 500 profile_missing rather than silently using a default constant.

Per-user quota — user_profiles

user_profiles.month_limit is the single source of truth for each user’s managed-AI monthly cap. The Edge Function, the UsageBar in the Settings UI, and any future admin tooling must all resolve the cap from this column — never from an env var, hard-coded constant, or client-side cache.

ColumnTypeNotes
month_limitint4DEFAULT 50 (SUR-92). Authoritative monthly cap before any override.
month_usageint4Numerator snapshot maintained by the trg_ai_usage_month_delta trigger (function: apply_ai_usage_month_delta) on ai_usage_daily (SUR-73, captured in migration 0013).
percent_userealGenerated month_usage / month_limit — what ProfileScreen.UsageBar reads.
allocation_overrideint4 (nullable)SUR-92 additive boost. NULL or 0 → no boost.
allocation_override_expires_atdate (nullable)Date the override stops applying. expires_at >= CURRENT_DATE is still active. NULL means “no override” — both the value and the expiry must be set for a boost to apply. A permanent override is expressed by setting expires_at to a far-future date.

Resolution (mirrored in supabase/functions/anthropic-proxy/index.ts → getResolvedMonthlyLimit):

monthly_limit_user = month_limit
+ (allocation_override > 0
AND allocation_override_expires_at >= CURRENT_DATE
? allocation_override : 0)

Numerator caveat — apply_ai_usage_month_delta has no DELETE branch. Deleting an ai_usage_daily row will not decrement month_usage. In practice rows are never deleted (the proxy only inserts and upserts), so this is not a current problem; flag here for anyone introducing a cleanup job.

Insert paths. The canonical bootstrap is the handle_new_auth_user() Postgres trigger (supabase/migrations/0020_generic_profile_creation_trigger.sql, SUR-362) — it fires on every auth.users INSERT and creates a user_profiles row inheriting the month_limit DEFAULT (50), user_tier DEFAULT (free), and a name derived from raw_user_meta_data.full_name with split_part(email, '@', 1) as fallback. The approve-waitlist Edge Function upsert is a best-effort overlay on top — post-SUR-362 the row already exists for any signed-up user, so the upsert lands approval-derived columns (waitlist name, month_usage snapshot, member_since) rather than creating the row. Users cannot edit their own quota: the user_profiles UPDATE policy restricts non-service-role writes to display_name and profile_bio only. The SUR-230 admin dashboard will write allocation_override via the service role.

BYOK exclusion

BYOK calls still go browser -> Anthropic directly (src/api.js), so they never reach the Edge Function. Managed usage counters therefore exclude BYOK activity by architectural separation. This also means BYOK users bypass quota enforcement entirely and remain responsible for their own Anthropic billing.

RLS

Users may SELECT their own rows (foundation for a future self-service usage page). INSERT and UPDATE are restricted to the service role — no RLS policy grants clients write access.

Future hybrid design

This aggregate model is sufficient for quota enforcement and monetisation reporting. If full per-request audit trails are needed later (e.g. debugging, dispute resolution), add an append-only ai_usage_log event table as an additive migration alongside this one. The aggregate table’s structure would not need to change.

Entitlements — user_profiles (SUR-327, Phase A of SUR-235)

Phase A of SUR-235 generalises the SUR-92 SSoT pattern from “managed-AI usage” to the full v1.5 paygated-features matrix. The resolver in supabase/functions/_shared/entitlements.ts → getResolvedEntitlements is the single authority for what a user is entitled to. The me-entitlements Edge Function exposes it to the client; useEntitlements() reads from there. No client-side cache, env var, or duplicate constants module may shadow the resolver — the React app deliberately does not own the Free / Pro defaults.

Columns introduced or tightened in migration 0017_entitlements_phase_a.sql:

ColumnTypeNotes
user_tiertextNOT NULL DEFAULT 'free', CHECK (user_tier IN ('free', 'pro')). Originally added by SUR-318; locked down by SUR-327. SSoT for which tier the user is on. The Stripe webhook (SUR-85) will write here on subscribe / cancel.
tier_started_attimestamptzNOT NULL DEFAULT now(). Drives the SUR-328 lapsed-Pro 30-day grace window.
image_storage_bytes_usedbigintNOT NULL DEFAULT 0. Running counter maintained transactionally by the image-upload Edge Function via adjust_image_storage_bytes. Backfilled once from storage.objects in 0017.

The resolver returns:

type Entitlements = {
tier: 'free' | 'pro'
managedAiMonthlyLimit: number | null // null = unlimited; else month_limit + override
customIdeasLimit: number | null
maxDevices: number | null
imageStorageBytesLimit: number | null
capabilities: {
reDiscoverIdeas: boolean // the strong wall (SUR-235)
importReadwise: boolean
importKindle: boolean
exportJson: boolean
}
}

managedAiMonthlyLimit honours the SUR-92 column path (month_limit + valid allocation_override) when populated; the other fields come from FREE_DEFAULTS / PRO_DEFAULTS in _shared/entitlements.ts.

adjust_image_storage_bytes(user_id, delta) → bigint

SECURITY DEFINER function added in 0017 so the image-upload Edge Function can atomically add or subtract bytes from user_profiles.image_storage_bytes_used without a client-side read-modify-write race. EXECUTE is granted to service_role only — anon and authenticated are explicitly revoked, so a malicious caller cannot inflate or zero a peer’s quota. The function is paired with the post-upload increment in the image-upload handler; the Edge Function also issues a compensating storage delete if the increment fails so the counter never lags reality.

Phase A is observation-only

The four non-quota gates (re_discover_ideas, custom_ideas_limit, max_devices, image_storage_bytes) are wired up client-side and emit would_have_blocked PostHog events but do not alter behaviour. SUR-262 observes for two weeks; SUR-328 flips the predicates to enforcing 4xx responses keyed to whichever capability the data validates as the strong wall. The SUR-92 quota gate continues to enforce per-user month_limit unchanged.

Stripe billing — user_profiles + stripe_webhook_events (SUR-85)

SUR-85 adds the data-plane plumbing behind paid Pro subscriptions. The corresponding Edge Functions (create-checkout-session, stripe-webhook, create-billing-portal-session) are documented in System Architecture → Stripe billing flows.

user_profiles — billing columns

Migration 0019_stripe_billing.sql adds four additive columns. All four are nullable; existing users carry NULLs until their first checkout.

ColumnTypeNotes
stripe_customer_idtext (nullable)Stripe cus_… identifier. Populated lazily by create-checkout-session on first checkout; persists across cancel + resubscribe so the same Stripe customer is reused. Partial UNIQUE index WHERE NOT NULL prevents two profiles from binding to the same customer.
stripe_subscription_idtext (nullable)Stripe sub_… of the active or most-recently-cancelled subscription. Updated by the webhook on customer.subscription.created/updated; retained on cancellation for audit correlation.
subscription_statustext (nullable)Lifecycle status mirrored from Stripe. Explicit CHECK enumerates the seven values we expect: active, trialing, past_due, canceled, paused, incomplete, incomplete_expired. A future Stripe-side addition (e.g. unpaid) surfaces as a webhook 500 → Stripe retry → prompt for a follow-up migration; we deliberately do not accept unknown statuses silently.
subscription_current_period_endtimestamptz (nullable)End of the current Stripe billing period. Drives the lapsed-Pro 30-day grace window in the resolver (see below).

user_tier (from SUR-327) remains the SSoT for billing intent. The webhook flips it to 'pro' on the first transition into an active state and never writes 'free' on cancellation — the deferred-grace check in the resolver owns the pro→free decision.

stripe_webhook_events — idempotency ledger

CREATE TABLE stripe_webhook_events (
event_id text PRIMARY KEY, -- Stripe's evt_... id
event_type text NOT NULL,
received_at timestamptz NOT NULL DEFAULT now(),
processed_at timestamptz, -- NULL until handler completes
payload jsonb NOT NULL -- full Stripe event for audit
);
CREATE INDEX stripe_webhook_events_type_received_idx
ON stripe_webhook_events (event_type, received_at DESC);

Stripe retries deliveries on any non-2xx response, so the webhook handler must be idempotent. The dedup primitive is:

INSERT INTO stripe_webhook_events (event_id, event_type, payload)
VALUES (...)
ON CONFLICT (event_id) DO NOTHING
RETURNING event_id;

Empty RETURNING means “already processed” — the handler short-circuits to a 200 without re-applying the event. After dispatch completes the row’s processed_at is stamped; rows where processed_at stays NULL flag a handler that crashed mid-dispatch (Stripe will retry, but operators may want to know).

Ledger reclaim — processed_at IS NULL. A row exists in two valid states: processed_at IS NOT NULL (handler completed, retries should short-circuit to a 200 deduplicated) and processed_at IS NULL (the INSERT succeeded but the dispatch handler crashed before the marker was written). On a Stripe retry, the upsert re-conflicts; the function then re-reads processed_at. NULL → re-run dispatch (the prior delivery’s state changes never landed; failing to retry would silently strand a paid subscription event). NOT NULL → skip dispatch and ack.

Concurrency note. Two parallel deliveries of the same event.id (rare but possible) could both observe processed_at IS NULL and both re-dispatch. Each handler is idempotent at the user_profiles level (UPDATE with stable inputs) so the duplicate work is harmless. A stricter contention guard via SELECT … FOR UPDATE was deferred — bring it back if telemetry surfaces a race in production.

RLS: the table is service-role-only. RLS is enabled with no policies for authenticated or anon, and the GRANT/REVOKE matrix is locked down at the table level so even an accidentally-permissive future policy would still hit a “permission denied” before reaching RLS. The webhook handler is the only intended writer and reader; payloads contain PII (billing address, card last4) that has no business reaching authenticated client code.

Resolution — deferred grace window

The 30-day lapsed-Pro grace is enforced in getResolvedEntitlements:

effective_tier = user_tier -- usually
if user_tier = 'pro' AND subscription_status = 'canceled':
if subscription_current_period_end + 30 days < now()
OR subscription_current_period_end IS NULL:
effective_tier = 'free'

Why deferred and not column-flip: if Stripe sends a renewal during the grace window (uncommon but possible — e.g. user updates a failing card), the user remains effectively Pro without any restoration write. We avoid a daily pg_cron sweep entirely, the column stays as billing state, and the resolver returns the effective tier. The column-based flip is still possible later if observability tooling needs user_tier='free' to be visible at the row level — all the necessary data is already in subscription_status + subscription_current_period_end.

Defensive NULL handling: subscription_current_period_end IS NULL on a canceled row is an invariant break (the webhook always writes it), but the resolver treats it as “grace already expired” so an upstream bug can never grant indefinite Pro by accident.


Prompt versioning — prompts + ai_usage_events (SUR-316)

Managed-AI system prompts (and their model / max_tokens) are no longer code constants. They live in the service-role-only prompts table and are loaded per call by anthropic-proxy, so a prompt can be hot-swapped or rolled back by row and every call can be sliced by the version that produced it. See Tables → prompts and ai_usage_events for the column catalog, and the Prompt versioning runbook for activation / rollback procedure.

Load path & fail-open

getPrompt(supabaseAdmin, name) (supabase/functions/anthropic-proxy/promptLoader.ts) selects the active rows (where is_active) on the first call of a cold isolate and caches them in-memory for ~5 minutes. The three names are transcribe, discover_canon, and discover_with_custom; the with-custom body is a template carrying a {{customList}} placeholder the call site fills at runtime (renderDiscoverWithCustom). On any DB read error the loader returns the hardcoded SUR-303 constant for that name and marks it fallback: true — the request never blocks (same posture as the Azure guardrails). A fallback records prompt_version = 0 and surfaces _promptFallback: true in the proxy response, which src/api.js / photoAdapter.js expose as promptFallback and the hooks report as a prompt_fallback PostHog event (mirroring _failOpenguardrail_fail_open).

Operational consequences: a new active version takes up to the cache TTL to propagate, and independently per Edge isolate. A spike in prompt_version = 0 means the prompts table is unreachable (loader is failing open), not a quality change.

Per-call telemetry — ai_usage_events

ai_usage_daily (the quota aggregate) and upsert_ai_usage are unchanged. Per-call version telemetry instead lands in the append-only ai_usage_events table via the insert_ai_usage_event RPC, written best-effort alongside the quota upsert inside recordUsage — a failure there is logged and swallowed (analytics must never fail a user request), so ai_usage_events may under-count vs ai_usage_daily.request_count under DB pressure. The same fields also fire to the server-side PostHog event managed_ai_call_succeeded (prompt_name / prompt_version / fail_open / prompt_fallback) — the foundation for the “managed AI quality by prompt_version” insight and the A/B follow-up.

Grants

prompts, ai_usage_events, and insert_ai_usage_event are service-role-only with explicit REVOKE … FROM public, anon, authenticated (SUR-499 convention) and search_path-pinned secdef functions (the 0011 mandate). Migration 0030 additionally closes a pre-existing gap by revoking EXECUTE on upsert_ai_usage from anon / authenticated — Supabase’s default privileges had left it callable directly via PostgREST.


Access control / RLS (SUR-363)

Every user-owned table is RLS-protected. Post-SUR-363 (migration 0021_drop_waitlist_rls_gate.sql, 2026-05-12), the gate is ownership-only — auth.uid() = user_id — across books, notes, custom_ideas, and the note-images storage bucket. The 0007 waitlist-EXISTS overlay that briefly layered on top of those predicates is gone.

TableOperation(s)Policy
booksALLauth.uid() = user_id (USING + WITH CHECK). Renamed back to “users see own books” from the SUR-7 “approved users see own books” alias.
notesALLSame shape. Note ownership is user_id; soft deletes (deleted=1) still respect the predicate.
custom_ideasALLSame shape.
storage.objects (note-images)ALLauth.uid()::text = (storage.foldername(name))[1] — first path segment is the user’s UUID, and the predicate is identical USING + WITH CHECK.
user_profilesSELECTAuthenticated user can read own row only.
user_profilesUPDATEAuthenticated user can update own row only, with column-level restriction enforced client-side: only display_name and profile_bio. month_limit / allocation_override* / user_tier / Stripe columns are service-role writes only.
user_profilesINSERT / DELETENo policy. INSERTs come from the handle_new_auth_user() trigger (SECURITY DEFINER); DELETEs cascade from auth.users.
ai_usage_dailySELECTAuthenticated user reads own rows (foundation for a future usage page).
ai_usage_dailyINSERT / UPDATEService-role only — written exclusively by anthropic-proxy via rpc('upsert_ai_usage', …).
wrapped_key_blobsALLauth.uid() = user_id. Plus the select_fresh_transfer_blob RPC enforces a server-side TTL on transfer-v1 blobs.
stripe_webhook_eventsALLRLS enabled with no policies for authenticated / anon. Service-role-only by table-level GRANT/REVOKE. The payload contains billing PII; no path to authenticated client code.
waitlist_requestsALLService-role-only (legacy table preserved for historical data; the approve-waitlist Edge Function and admin UI are the remaining readers — SUR-369 tracks final retirement).
admin_audit_logSELECT (read policy)Admin-dashboard schema (0039, SUR-289). Read policy admins read audit log — authenticated SELECT gated on the caller’s own user_profiles.role = 'admin'. Append-only by GRANT, not policy: service_role bypasses RLS, so the forensic invariant is held at the privilege layer — service_role is granted SELECT, INSERT only (not ALL), withholding UPDATE/DELETE/TRUNCATE; preceded by an explicit REVOKE ALL because the SUR-499/0033 default ACL grants Dxtm (incl. TRUNCATE) on every new table. authenticated has SELECT only.
fixed_costsALLAdmin spend panel (0039, SUR-289). RLS enabled, no policies. Service-role-only by table-level GRANT/REVOKE; edited via the SUR-377 Edge Function.
cost_snapshotALLAdmin spend panel (0039, SUR-289). RLS enabled, no policies. Service-role-only by table-level GRANT/REVOKE; append-only variable-cost cache.

The relaxation co-deployed with migration 0020’s handle_new_auth_user() trigger — without 0021 a direct signup would hold a valid JWT and a user_profiles row but silently 403 every books / notes / custom_ideas read because no waitlist_requests row exists for them. The two migrations land in the same PR / branch so the regression window between “trigger no longer back-links” and “RLS no longer requires the back-link” never opens in production.

Rollback. docs/runbooks/open-signup-rollback.md covers the five-step ladder. Reverting 0021 is step 5 (rarely needed — step 1 “disable signups in Supabase Studio” closes the door without touching schema and is the default lever).


Confirmed / Assumption / Unknown

  • Confirmed:
    • Dexie schema versions and Supabase table definitions match field-for-field, including provenance data and tags (src/db.js, supabase/migrations/0001_initial_schema.sql, 0003_notes_add_source_ingest.sql).
    • Export/import, custom idea CRUD, and tag rename cascades operate entirely on Dexie tables with subsequent cloud writes (src/hooks/useSettings.js).
    • Outbox queue plus collapseOutboxItems guarantee idempotent replay semantics (tests cover JSON merging).
  • Assumption:
    • Incremental sync filtering (fetchSince) is designed but not actually used by fetchAllCloud. Now implemented (SUR-739 / SUR-652): fetchAllCloud pulls each table via fetchSince keyed on the server change_seq watermark, paged, with per-table checkpoints in meta.lastSeq:<table> — see the “Incremental pull — server watermark + LWW guard” section above.
    • sourceId dedupe is expected to be used by future adapters; current adapters set it to null, so duplicate detection is effectively manual.
    • Storage cleanup for deleted notes is likely manual because no delete API is called.
  • Unknown:
    • Schema consistency still depends on developers running npm run check:schema and applying migrations; the repo has detection tooling but no automated enforcement pipeline (scripts/check-schema.js, supabase/migrations/*.sql).
    • Data retention policies for Supabase Storage (especially for deleted/tombstoned notes) are not codified.
    • Multi-user or shared libraries are not modeled; all tables enforce owner-only visibility with no sharing semantics.