Data Architecture
Data Architecture
Looking for the column-level catalog? See Tables, Functions, Triggers. This page is the architecture narrative — those three catalogs are the source-of-truth mirror of
supabase/migrations/*.sql.
CHANGE SUMMARY
- Updated: Reflected the live managed AI usage writer (anthropic-proxy Edge Function +
ai_usage_daily) and how rate limits are enforced (supabase/functions/anthropic-proxy/index.ts,supabase/migrations/0004_ai_usage_tracking.sql,0005_upsert_ai_usage_fn.sql).- Updated: Clarified how BYOK vs. managed paths influence payload destinations and risks (
src/api.js,src/supabase.js).- Updated (2026-04-29, SUR-92): Added full per-user quota model —
user_profiles.month_limit(default 50),allocation_override,getResolvedMonthlyLimit.getMonthlyUsagesums across all action types. Trigger name confirmed astrg_ai_usage_month_delta(function:apply_ai_usage_month_delta).emitTelemetryis fire-and-forget viafireAndForget().- Updated (2026-05-02, SUR-233): Added
wrapped_key_blobsentity anddevice_labelcolumn; documented E2EE key-wrapper table and how device labels are stored at enrolment time.- Updated (2026-05-02, SUR-237):
wrapped_key_blobs.created_atnow gets a server-side DEFAULT ((extract(epoch from now()) * 1000)::bigint) via migration 0014, removing the client-clock dependency. Addedselect_fresh_transfer_blob(p_user_id, p_max_age_ms)SECURITY DEFINER RPC — the authoritative TTL gate for transfer-v1 blobs;redeemDeviceTransfercalls this instead of the old client-side age check.- Updated (2026-05-07, SUR-85): Stripe billing infrastructure added in migration
0019_stripe_billing.sql. Four additive columns onuser_profiles(stripe_customer_id,stripe_subscription_id,subscription_status,subscription_current_period_end) and a newstripe_webhook_eventsidempotency ledger keyed on Stripe’sevent.id. The lapsed-Pro 30-day grace window is enforced at read time bygetResolvedEntitlementsrather than via a daily cron flip —user_tierstays'pro'aftersubscription.deletedand the resolver returns'free'only oncesubscription_current_period_end + 30 dayshas passed. See Stripe billing below.- Updated (2026-05-12, SUR-358 cluster / SUR-368): Access control relaxed for the open-signup cutover. Migration
0021_drop_waitlist_rls_gate.sql(SUR-363) dropped the waitlist-EXISTS predicate that migration 0007 had layered onto the RLS policies forbooks,notes,custom_ideas, and thenote-imagesstorage bucket. RLS is now ownership-only —auth.uid() = user_idis the sole gate — restoring the 0001 contract. The waitlist gate was redundant the moment SUR-362’shandle_new_auth_user()trigger (migration0020) stopped requiring an approvedwaitlist_requestsrow to materialise auser_profilesrow; without 0021 a direct signup would hold a valid JWT and a profile row but silently 403 every data read. The migrations co-deploy (same PR / branch) so the regression window between “trigger no longer back-links” and “RLS no longer requires the back-link” never opens in production. See Access control / RLS below for the consolidated picture. Rollback procedure documented indocs/runbooks/open-signup-rollback.md(SUR-368).- Updated (2026-06-03, SUR-527): Device-local
embeddingsstore (Dexie v13). The first table that is local-only by construction — never enqueued, absent fromflushOutbox/fetchAllCloud, and excluded from the JSON export. Stores per-capture vectors (seeded on-device for future semantic search, SUR-157) quantized to int8 and encrypted at rest with the Master Key (src/crypto/byteEncryption.js, AAD-bound tonoteId). Generation is dark-launched behindVITE_ON_DEVICE_EMBEDDINGSand gated on passkey unlock; a per-device Settings control governs the model download.SCHEMA_VERSION→ 13. The companion lexical Search view (/search) indexes the already-decrypted in-memory notes (MiniSearch), not this store.- Updated (2026-05-30, SUR-316): Prompt versioning v1. Two new service-role-only tables:
prompts(migration0027, seeded0028) holds the versioned managed-AI system prompts theanthropic-proxyEdge Function loads per call (one active row per name, fail-open to code constants);ai_usage_events(migration0029) is an append-only per-call log tagging each managed call withprompt_name+prompt_version.ai_usage_dailyandupsert_ai_usageare unchanged — quota accounting is untouched;ai_usage_eventsis a parallel analytics sink written best-effort via the newinsert_ai_usage_eventRPC. Migration0030revokesEXECUTEonupsert_ai_usagefromanon/authenticated(pre-existing gap). See Prompt versioning below and the catalog entries: Tables →prompts,ai_usage_events, Functions →insert_ai_usage_event.- Updated (2026-06-16, SUR-605): Synced lexicon tables documented (SUR-598 backfill). Three new synced, per-user, RLS-own-rows tables:
lenses(Dexie v15, migration0042) andcollections+collection_memberships(Dexie v16, migration0043), each with SUR-499 explicit REVOKE/GRANT. Captures the load-bearing invariant — lenses and collections resolve at read time, never stored as nodes or edges — plus the offline-first sync/tombstone/cascade lifecycle and export/import coverage.SCHEMA_VERSION→16. See Lenses & collections below and the catalog entries: Tables →lenses,collections,collection_memberships.- Updated (2026-06-20, SUR-580): Device-local
discovery_jobsstore (Dexie v17) — bulk idea discovery. The second table that is local-only by construction (afterembeddings): never enqueued, absent fromflushOutbox/fetchAllCloud, excluded from the export, and cleared byimportReplace+ the full wipe so a restore can’t strand a job on reused note ids. It holds the orchestration state (per-source batches, cursor, retry/pause status, applied-idea roles, counts) for the resumable bulk idea-discovery engine — distinct from the per-recordoutbox, which collapses-and-deletes on flush and so can’t carry durable job progress. The results (the discovered tags) live onnote.tags, which does sync; the job rows never do. No new Postgres table or migration — the only server surface is PR-1’s batcheddiscover_batchproxy action.SCHEMA_VERSION→17. See Bulk idea discovery below.
All statements are backed by repository code/config as of 2026-06-20 (SUR-580 device-local discovery-jobs engine; SUR-605 synced-lexicon backfill; SUR-198 cover-art schema).
Canonical entities
| Entity | Local representation | Cloud representation | Notes |
|---|---|---|---|
| Book | Dexie books table with id, title, author, isbn, coverUrl, coverSource, coverResolvedAt, timestamps, deleted flag (src/db.js) | Supabase books table with same fields + user_id (FK to auth.users) (supabase/migrations/0001_initial_schema.sql, 0025_sur198_book_covers.sql) | Deleting a book cascades soft-deletes to related notes locally before syncing. Cover-art + isbn fields added in SUR-198 (Dexie v10 / migration 0025); resolved client-side from Open Library, last-write-wins like the rest of the row. See Tables → books. |
| Note | Dexie notes table containing text, page, tags[], imagePath, imageDataUrl, source, sourceId, sourceMeta, timestamps, deleted (src/db.js) | Supabase notes table storing text, page, tags jsonb, image_path, source, source_id, source_meta, timestamps, deleted (supabase/migrations/0001_initial_schema.sql, 0002_notes_add_source.sql, 0003_notes_add_source_ingest.sql) | Dexie keeps imageDataUrl for offline previews; Supabase stores only the storage path. Tags have a multiEntry index locally. |
| Custom idea | Dexie customIdeas with name, description, timestamps, deleted | Supabase custom_ideas with same shape + user_id (supabase/migrations/0001_initial_schema.sql) | Custom ideas influence AI prompts; renaming cascades to notes via useSettings. |
| Lens | Dexie lenses table { id, name, leafIds[], combinator: 'AND', createdAt, updatedAt, deleted } (Dexie v15, src/db.js) | Supabase lenses table — same fields snake-cased (leaf_ids text[], combinator) + user_id (FK to auth.users) (supabase/migrations/0042_lenses.sql) | A user-minted lens: a saved AND-intersection of canon ideas — definition only (name + leafIds), never a node or an edge. Canonical lenses are static code (src/lib/canonicalLenses.js), never synced; the canon: id prefix is reserved (guarded in saveLens) so only user rows persist. Plaintext (no encryption seam). SUR-499 explicit REVOKE/GRANT. See Lenses & collections and Tables → lenses. |
| Collection | Dexie collections table { id, name, createdAt, updatedAt, deleted } (Dexie v16, src/db.js) | Supabase collections table + user_id (supabase/migrations/0043_collections.sql) | A manual grouping of notes — name only, no note content (plaintext). Deleting a collection cascades soft-deletes to its membership rows (single Dexie transaction). SUR-499 explicit REVOKE/GRANT. See Lenses & collections and Tables → collections. |
| Collection membership | Dexie collection_memberships table { id, noteId, collectionId, createdAt, updatedAt, deleted } (Dexie v16, src/db.js) | Supabase collection_memberships join table + user_id; FKs note_id → notes(id) and collection_id → collections(id), both ON DELETE CASCADE (supabase/migrations/0043_collections.sql) | A note↔collection query-time join row — not a graph edge and not a node (fabricates no co-occurrence). Deterministic id <collectionId>:<noteId> (membershipId) makes add/remove idempotent and a note at-most-once per collection. Note- and collection-deletes cascade membership soft-deletes. See Lenses & collections and Tables → collection_memberships. |
| Meta | Dexie meta table storing apiKey, lastSyncAt (src/db.js) | None | Data never leaves the device; API key is local-only (managed users can omit it). |
| Outbox | Dexie outbox table of {table, recordId, payload, createdAt} (src/db.js) | Not persisted in cloud | Flushed via flushOutbox and deleted once Supabase confirms writes. |
| Embedding (device-local) | Dexie embeddings table { noteId, modelVersion, dim, sourceHash, vector, updatedAt } (Dexie v13, src/db.js; sealed/read via src/embeddings/embeddingStore.js) | None — never synced | The first local-only-by-construction table: not enqueued, absent from flushOutbox/fetchAllCloud, excluded from the export. Per-capture int8 vector encrypted at rest with the Master Key (AAD = noteId, src/crypto/byteEncryption.js), seeded on-device (MiniLM via Transformers.js) for future semantic search (SUR-157). Cleared on importReplace + full wipe so vectors aren’t stranded on reused note ids. Dark-launched behind VITE_ON_DEVICE_EMBEDDINGS; generation gated on passkey unlock. |
| Discovery job (device-local) | Dexie discovery_jobs table { id, status, origin, maxIdeas, createdAt, updatedAt, batches[], cursor, preview, appliedDetail, counts } (Dexie v17, src/db.js; helpers saveJob/getJob/getActiveJobs) | None — never synced | The second local-only-by-construction table (SUR-580). Holds the orchestration of a bulk idea-discovery run — per-source batches of note ids, a cursor, retry/pause status, the stored first-batch preview, the role-tagged appliedDetail (for undo), and run counts. Not the outbox (which collapses-and-deletes on flush — no place for durable job progress). Plaintext orchestration metadata, no note content. Not enqueued, absent from flushOutbox/fetchAllCloud, excluded from the export; cleared on importReplace + full wipe so a job can’t strand on reused note ids. A static guard test asserts the table name never appears in src/supabase.js. The run’s output is note.tags, which does sync. See Bulk idea discovery. |
| AI usage | None (client) | Supabase ai_usage_daily table (supabase/migrations/0004_ai_usage_tracking.sql) | Tracks managed Anthropic usage only. BYOK calls never appear here. Written exclusively by the Edge Function via rpc('upsert_ai_usage', …); never synced to the client. See AI Usage Tracking. |
| Managed prompt | None (client) | Supabase prompts table (supabase/migrations/0027_sur316_prompts_table.sql, seeded 0028) | Versioned managed-AI system prompts loaded per call by anthropic-proxy. Service-role-only; never reaches the client. One active row per name. See Prompt versioning. |
| AI usage event | None (client) | Supabase ai_usage_events table (supabase/migrations/0029_sur316_ai_usage_events.sql) | Append-only per-call log tagging each managed call with prompt_name + prompt_version. Best-effort analytics sink (does not gate quota); written via rpc('insert_ai_usage_event', …). SUR-316. |
| E2EE key wrapper | None (client) | Supabase wrapped_key_blobs table | One row per enrolled device. Columns include: id (uuid), user_id (FK), type (prf-v1 | transfer-v1 | pin-v1), blob (base64 encrypted MK), is_active (bool), device_label (text — "Platform · Browser" from getDeviceLabel(), set at enrolment, never updated), created_at (bigint Unix ms — server-stamped via DEFAULT ((extract(epoch from now()) * 1000)::bigint) since migration 0014; client code no longer supplies this for new inserts). Server-side only — not part of the Dexie sync model. Managed entirely through useKeyManagement + src/supabase.js. Transfer-v1 freshness enforced by the select_fresh_transfer_blob RPC rather than client-side age checks. |
Schema evolution & parity
- Dexie evolves through ten versions, introducing
updatedAt, soft deletes,imagePath,source,sourceId,sourceMeta, and a multiEntrytagsindex (v2–v6), then passkey/encryption meta (v7), Master Key meta (v8), notechapter(v9), and book cover-art +isbnfields (v10, SUR-198) (src/db.js). - Supabase schema mirrors these additions via the migration set in
supabase/migrations/*.sqlplus the declarativescripts/schema-contract.js;npm run check:schemaexecutesscripts/check-schema.jsto compare a live database to that contract. SCHEMA_VERSIONnow equals17(v11 SUR-505 tag rename, v12 SUR-84note_links, v13 SUR-527 device-localembeddings, v14 SUR-597 contemporary-canon swap, v15 SUR-598lenses, v16 SUR-598collections+collection_memberships, v17 SUR-580 device-localdiscovery_jobs), andsrc/test/export-import.test.jsfails if someone bumps the Dexie version without updating exports, so backups retain provenance/tag/cover fields and the lexicon tables (lenses,collections,collection_membershipsall serialize). The v13embeddingsand v17discovery_jobstables are intentionally excluded from the export (derived / device-local orchestration; a regression test assertsbuildExportomits each).notes_source_id_idxis intentionally non-unique to avoid rejecting duplicate highlights during cross-device imports (see comments insupabase/migrations/0003_notes_add_source_ingest.sql).
Data flows
flowchart LR Capture[Capture / Manual Input] --> Ingest[ingest adapters\nsrc/ingest/*.js] Ingest --> NoteForm[useNoteForm\nsrc/hooks/useNoteForm.js] NoteForm --> Dexie[Dexie tables\nsrc/db.js] NoteForm --> Outbox[Outbox queue] Dexie --> Export[Export JSON\nbuildExport()] Dexie --> LoadAll[loadAll() -> UI] Outbox --> SupabaseUpsert[cloudWrite/flushOutbox\nsrc/hooks/useAuth.js] SupabaseUpsert --> SupabaseDB[Supabase tables\nsupabase/migrations/*.sql] SupabaseDB --> Merge[mergeCloudRecords] Merge --> Dexie SupabaseDB --> Storage[Storage bucket note-images\nsrc/supabase.js]Lifecycle details
- Create/update: Hooks call
saveBook,saveNote, orsaveCustomIdea, which stampupdatedAtanddeleted: 0. Immediately afterwards,cloudWriteattempts an upsert; offline writes enqueuepayloadcopies inoutboxwithrecordIdto enable collapse (src/db.js,src/hooks/useAuth.js). - Delete: Books and notes use soft deletes (
deleted: 1); book deletes also soft-delete child notes inside a Dexie transaction. Hard deletes exist only for note cleanup (hardDeleteNote). When syncing, Supabase rows carrydeletedflags as booleans. - Images: Captured images are compressed and stored inline (
imageDataUrl); if Supabase upload succeeds,imagePathis saved viasetNoteImagePath, but the original base64 blob stays local for offline lightbox (src/hooks/useNoteForm.js). Sync download fills missing previews using Supabase Storage (src/hooks/useAuth.js). - Tags & provenance: AI-discovered tags populate
note.tags; provenance is stored viasource,sourceId, andsourceMeta. Dexie maintains a multiEntry index ontagsfor efficient idea filtering. Supabase stores tags asjsonb. - Import/export:
buildExportpackages books/notes/ideas — plusnote_links,lenses,collections, andcollection_memberships— with the_syntopiconflag;parseImportvalidates input (the lexicon tables default to[]so a pre-v15/v16 export imports cleanly).importMergeinserts only new IDs, whileimportReplacewipes and restores tables — and explicitly clears all three lexicon tables before restoring, or a replace would leave the prior library’s saved lenses/collections shadowing the incoming set (the SUR-84 regression class: a synced table missing fromimportReplacesilently survives a restore). The device-localembeddingsanddiscovery_jobstables are excluded from the export but are cleared byimportReplaceand the full local wipe (SUR-527 / SUR-580), so a replace can’t strand vectors or a job on reused note ids. Settings UI exposes merge vs. replace flows (src/hooks/useSettings.js). - Outbox collapse:
collapseOutboxItemsmerges queued payloads by(table, recordId), enforcing delete stickiness and chronological ordering before Supabase writes (src/supabase.js, tests insrc/test/outbox.test.js).
Lenses & collections — the synced lexicon (SUR-598)
SUR-598 added a lexicon layer on top of the idea graph: lenses (saved AND-intersections of canon ideas) and collections (manual groupings of notes), backed by three new synced tables. lenses landed in PR1 (Dexie v15, migration 0042_lenses.sql); collections + collection_memberships in PR2 (Dexie v16, migration 0043_collections.sql); PR3a (useLenses/useCollections) added the create/rename/delete UI but no new table or mechanism. All three are per-user, RLS own-rows, plaintext (no encryption seam — they hold definitions and references, never note content).
Load-bearing invariant — read-time resolution, never materialised. A lens and a collection are resolved at read time and are never stored as nodes or edges in the idea graph. A lens persists only { name, leafIds, combinator: 'AND' }; the set of notes it matches is computed on demand from the live note set, never written back. A collection’s membership is a real row, but it is a query-time join, not a graph edge: it fabricates no co-occurrence between the notes it groups. This is the deliberate contrast with note_links (SUR-84, migration 0034), which is a first-class graph edge with a relationType. Conflating the two — materialising a lens/collection as edges — would inject phantom co-occurrence into idea analytics, so the invariant is load-bearing.
Canonical vs. user lenses. The six canonical lenses are static code (src/lib/canonicalLenses.js, Object.freezed, source: 'canon') and are never synced — they ship with the app. Only user-minted lenses become lenses rows. The canon: id prefix is reserved: saveLens throws if a user lens id starts with canon:, so a synced row can never shadow a canonical one.
Membership identity. A membership id is deterministic — membershipId(collectionId, noteId) returns `${collectionId}:${noteId}` (src/db.js). Because note/collection ids are colon-free uid() values, this is unambiguous and makes a note at most once per collection; add/remove are idempotent and converge across devices without duplicate rows (a dev-only guard logs if a : ever appears in either component).
Sync model these ride. Identical to the rest of the offline-first stack:
- Write: hooks stamp
updatedAt/deletedand enqueue payloads inoutbox;flushOutboxrouteslenses→upsertLens,collections→upsertCollection,collection_memberships→upsertCollectionMembership(src/supabase.js), all plaintext.upsert()s. - Read/merge:
fetchAllCloudpulls each table viafetchOptionalTable(a resilient fetch — a client syncing before the migration is applied degrades to “none” instead of bricking the whole sync), andmergeCloudRecordsapplies last-write-wins byupdated_at(src/db.js). - Delete (tombstones): soft-delete only. Local state holds live rows; the cloud keeps the tombstone so a delete converges across devices. The RLS policies deliberately omit an
AND deleted = falsefilter precisely so an owner can still pull their own tombstones viafetchSince. - Cascade: deleting a collection soft-deletes its memberships (
deleteCollection→softDeleteMembershipRows, one Dexie transaction); deleting a note soft-deletes its memberships (softDeleteMembershipsForNote, called fromuseNoteActions). At the Postgres layer the membership FKs areON DELETE CASCADE— the hard-delete backstop behind the app’s soft-delete.
Export / import coverage. All three tables are serialized by buildExport and round-tripped by parseImport/importMerge. Critically, importReplace clears all three before restoring — omitting any one would let the prior library’s lenses/collections silently survive a “replace” restore (the SUR-84 export-path regression class). Pre-v15/v16 exports import cleanly because each table defaults to [].
SUR-499 grants. Each table follows the explicit-grant idiom (REVOKE ALL from public/anon/authenticated, then GRANT SELECT, INSERT, UPDATE, DELETE to authenticated and GRANT ALL to service_role) — ahead of the Oct 30 2026 Data-API cutover.
Column-level shapes, indexes, and RLS predicates are catalogued at Tables → lenses, collections, collection_memberships. The originating decisions are tracked in surfc/RISKS_GAPS_ASSUMPTIONS.md (the SUR-598 entry); this page closes the canonical-documentation gap that entry flagged.
Incremental pull — server watermark + LWW guard (SUR-739 / SUR-740 / SUR-652)
The offline-first model above (last-write-wins by client updated_at, soft-delete tombstones) is unchanged. What changes is the transport around it — three defects closed server-side + in the pull cursor:
- SUR-740 — server LWW guard. A
BEFORE UPDATEtrigger (t01_lww_guard) on all 8 synced tables rejects anUPDATEwhoseupdated_atis strictly older than the stored row.updated_atstays the client’s edit time (LWW is still client-clock — deliberately not a server-time trigger); equality passes. This stops a delayed / replayed / concurrent flush from clobbering a newer server row — the server-side complement to braird-core’s client-side outbox rebase (SUR-736). - SUR-739 — server visibility watermark. Each synced table gains a server-assigned, monotonic
change_seq bigint, stamped byt02_change_seqon every insert/update (it fires after the guard, so a rejected write doesn’t bump it). It is distinct fromupdated_at: a row becomes visible at flush, not at enqueue, so a cursor overchange_seqdelivers every row the server holds — closing the delayed/offline-flush hole where a row stamped before a puller’supdated_atcursor advanced was never fetched.change_seqis server-only (never in an upsert payload, so the synced column set is unchanged). - SUR-652 — keyset pagination + per-table cursors.
fetchSince(table, sinceSeq)now pages bychange_seq(.gt(cursor).order('change_seq').limit(1000)until a short page), andsyncFromCloudkeeps a per-table cursor (metakeylastSeq:<table>) instead of one globallastSyncAt. Exclusive.gtis exact — the watermark is unique + monotonic, so no boundary row is missed or duplicated (the old inclusiveupdated_at >= cursorboth re-pulled the boundary row and silently capped atmax_rows, skipping the overflow forever). The legacy globallastSyncAtis kept + still written, purely as a rollback fallback; an absent per-table cursor triggers a one-time cold-start full re-pull (which also recovers rows the old cursor holes — including thefetchOptionalTablemissing-table skip — had dropped).
Client / core split + rollback. The migrations + PWA cursor land together in surfc/; braird-core consumes the same change_seq watermark in its own pull cursor as a follow-up. The migrations are additive + backward-compatible (an un-upgraded client keeps working on the updated_at path; the guard passes its always-fresh writes), so they are rollback-safe. Accepted residual: two devices writing different values to the same row in the same millisecond still don’t converge (strict-> LWW keeps local; NTP-bounded, pathological) — a deterministic tie-break would be wire-visible and is out of scope.
Bulk idea discovery — the device-local job engine (SUR-580)
A fresh import (e.g. 1,000+ Readwise highlights) lands a wall of orphan notes — note.tags === []. The only tagging path before SUR-580 was per-note discovery, a 1,000-call cost bomb against a 50-call/month Free cap. SUR-580 makes tagging the backlog feasible by batching the managed-AI call and driving it from a resumable, device-local background job. It splits across three PRs: PR-1 (GCE) added the batched discover_batch proxy action — up to 25 notes per Anthropic call (SUR-582), one {core, supporting} result per note, eval-gated on the core-anywhere bar; PR-2 (CE, this section) is the client job engine; PR-3 wires the activation surfaces (import prompt, Commonplace chip, Settings action).
Pure engine / thin driver. Every state decision lives in a pure, unit-tested module (src/lib/bulkDiscovery.js): snapshotOrphans, groupAndPack (the SUR-580 packing rule — group by source, chunk a source > B, greedily pack chunks into ≤ B-note batches, never splitting a chunk), computeAppliedTags, applyResults, processBatch, and the runner reducers (markInFlight / onBatchError / finalizeJob / resumeJob). The React surface (src/hooks/useBulkDiscovery.js) owns only the imperative glue: persistence between batches, the sequential loop, the single-job guard, resume-on-mount, and telemetry.
discovery_jobs is orchestration state, NOT the outbox. The outbox is a per-record cloud-sync queue that collapses and deletes rows on flush — there is nowhere to keep batch progress / retry / resume. So the engine adds a dedicated Dexie store (v17) that tracks the orchestration; the resulting tag writes still flow through the existing editNote → outbox path for persistence + sync. A job row is { id, status, origin, maxIdeas, batches:[{ batchId, noteIds[], status, attempts }], cursor, preview, appliedDetail:{ [noteId]:{ added:[{name, role}], preserved[] } }, counts }. The applied-idea role (core vs supporting) lives in appliedDetail — never in note.tags, which stays a flat string[].
Device-local, never synced — by construction. discovery_jobs is the second table in this class (after embeddings): a job references note ids by batch, so it would strand on another device’s library; the run’s output (the tags) syncs through note.tags, the job rows never do. It is omitted from flushOutbox / fetchAllCloud / buildExport / parseImport / importMerge, and cleared by importReplace + the full wipe (the SUR-84 regression class — a device-local table missing from importReplace would let a stale job survive a restore on reused ids). The never-sync guarantee is enforced negatively by a static test (discovery-jobs-sync-exclusion.test.js) that fails if the literal discovery_jobs ever appears in src/supabase.js.
The apply flow — preview gate, then auto-apply with undo. core-anywhere ≈ 0.83 on the realistic arm means applying the whole emitted small set puts the right idea in the user’s tags ~5-in-6 times; the bar (founder, 2026-06-19) is to auto-apply — no per-note approval wall — bounded by a visible undo. So:
- Preview first.
startJobsnapshots orphans, packs them, persists the row inpreviewstatus, and runs only the first source’s batch — fetched and shown, not applied — so the user sees real proposed tags on their own library before anything commits. - Auto-apply on confirm.
confirmPreviewapplies that stored first batch (no second call), then the runner drains the rest, auto-applying each batch’s emitted set as it completes. Apply reuses the single-note recipe (computeAppliedTagsmirrorsuseNoteActions.rediscoverIdeas): preserve the note’s non-canonical tags, mergecore+supporting, dedup, and persist viaeditNote(the same encrypt-write + outbox-sync + state primitive — the engine receives it rather than re-implementing persistence). - Undo is the safety net. Per-note revert (to the pre-job
preservedtags) and per-idea removal, both viaeditNote. The core-vs-supporting undo rate (bulk_discovery_idea_undone{role}) is the post-ship signal that validates the auto-apply bar and decides any fast-follow.
Resumable, idempotent, quota-aware. Each batch persists, so a crash or navigation resumes from disk (the mount effect rewinds an in_flight batch to pending via resumeJob). livePending re-reads each note and idempotently skips any already tagged — by a prior partial run, a cross-device sync, or a concurrent single-note re-discover — so re-running never double-tags. A transient error retries ≤ 3× then marks the batch failed and advances the cursor (a poison batch never blocks the backlog); a 429 pauses the job (resumable next month / on upgrade) rather than failing it. A paused run is resumed by an explicit continuePaused action (a runner-paused job resumes its drain; a job paused before its first preview re-enters the preview gate) — auto-resuming on mount would just re-hit the cap.
PII & guardrails. Each note’s text is PII-redacted on-device before it leaves the client (runSafetyChecks / redactMatches, the non-interactive bulk analogue of the single-note review sheet); the server-side Azure prompt-injection shield (PR-1) screens each note individually and drops a poisoned one without sinking the batch. The batched call counts as a single request against the existing monthly cap — no new meter.
The originating decisions are tracked in surfc/RISKS_GAPS_ASSUMPTIONS.md (the SUR-580 PR-1 entry); the client engine and its job table are documented here.
AI Usage Tracking
ai_usage_daily (supabase/migrations/0004_ai_usage_tracking.sql) is the authoritative
store for per-user managed Anthropic usage. It is server-side only and is not part of the
offline sync model.
Table: ai_usage_daily
| Column | Type | Notes |
|---|---|---|
id | uuid | PK, gen_random_uuid() |
user_id | uuid | FK -> auth.users, NOT NULL |
action_type | text | Constrained to 'transcribe' or 'discover' |
window_day | date | UTC calendar date of the request |
request_count | integer | Number of successful requests in this window |
input_tokens | bigint | Accumulated input tokens from Anthropic responses |
output_tokens | bigint | Accumulated output tokens from Anthropic responses |
created_at | bigint | Unix ms — set on first upsert for this window |
updated_at | bigint | Unix ms — refreshed on every subsequent upsert |
No deleted column: this table is not part of the sync model and rows are never
removed by client code.
Tracking unit
Both request count and token counts (input and output separately) are recorded per row. This supports simple request-count quotas today and token-based billing in future paid tiers without a schema change.
Reset boundaries
- Daily window: the
window_daycolumn (UTC calendar date). - Monthly window: derived in query — no separate column required:
SELECT date_trunc('month', window_day AT TIME ZONE 'UTC') AS month,SUM(request_count), SUM(input_tokens), SUM(output_tokens)FROM ai_usage_dailyWHERE user_id = $1GROUP BY 1 ORDER BY 1;
Write path
Rows are written exclusively by the anthropic-proxy Supabase Edge Function using the
service role key. After a successful Anthropic response, recordUsage calls
rpc('upsert_ai_usage', …) with (user_id, action_type, window_day) so the function can
increment request_count and accumulate token totals atomically (supabase/functions/anthropic-proxy/index.ts,
supabase/migrations/0005_upsert_ai_usage_fn.sql). No client-side writes occur, and
BYOK calls never invoke this path.
The function enforces a per-user monthly limit (SUR-92) by summing request_count
across all action types for the current month (getMonthlyUsage applies no
action_type filter — transcribe and discover draw from the same shared cap) and
comparing the total to the value resolved by getResolvedMonthlyLimit — see
Per-user quota — user_profiles below for the SSoT
model. On quota overflow the function returns HTTP 429 and emits a
managed_ai_rate_limit_hit PostHog event via fireAndForget() (non-blocking — does not
pad the 429 response with PostHog network latency). On Supabase SELECT/RPC errors it
throws and returns HTTP 500, so quota enforcement never fails open. A missing
user_profiles row returns 500 profile_missing rather than silently using a default
constant.
Per-user quota — user_profiles
user_profiles.month_limit is the single source of truth for each user’s managed-AI
monthly cap. The Edge Function, the UsageBar in the Settings UI, and any future admin
tooling must all resolve the cap from this column — never from an env var, hard-coded
constant, or client-side cache.
| Column | Type | Notes |
|---|---|---|
month_limit | int4 | DEFAULT 50 (SUR-92). Authoritative monthly cap before any override. |
month_usage | int4 | Numerator snapshot maintained by the trg_ai_usage_month_delta trigger (function: apply_ai_usage_month_delta) on ai_usage_daily (SUR-73, captured in migration 0013). |
percent_use | real | Generated month_usage / month_limit — what ProfileScreen.UsageBar reads. |
allocation_override | int4 (nullable) | SUR-92 additive boost. NULL or 0 → no boost. |
allocation_override_expires_at | date (nullable) | Date the override stops applying. expires_at >= CURRENT_DATE is still active. NULL means “no override” — both the value and the expiry must be set for a boost to apply. A permanent override is expressed by setting expires_at to a far-future date. |
Resolution (mirrored in supabase/functions/anthropic-proxy/index.ts → getResolvedMonthlyLimit):
monthly_limit_user = month_limit + (allocation_override > 0 AND allocation_override_expires_at >= CURRENT_DATE ? allocation_override : 0)Numerator caveat — apply_ai_usage_month_delta has no DELETE branch. Deleting an
ai_usage_daily row will not decrement month_usage. In practice rows are never deleted
(the proxy only inserts and upserts), so this is not a current problem; flag here for
anyone introducing a cleanup job.
Insert paths. The canonical bootstrap is the handle_new_auth_user() Postgres
trigger (supabase/migrations/0020_generic_profile_creation_trigger.sql, SUR-362) —
it fires on every auth.users INSERT and creates a user_profiles row inheriting
the month_limit DEFAULT (50), user_tier DEFAULT (free), and a name derived
from raw_user_meta_data.full_name with split_part(email, '@', 1) as fallback.
The approve-waitlist Edge Function upsert is a best-effort overlay on top —
post-SUR-362 the row already exists for any signed-up user, so the upsert lands
approval-derived columns (waitlist name, month_usage snapshot, member_since)
rather than creating the row. Users cannot edit their own quota: the user_profiles
UPDATE policy restricts non-service-role writes to display_name and profile_bio
only. The SUR-230 admin dashboard will write allocation_override via the service
role.
BYOK exclusion
BYOK calls still go browser -> Anthropic directly (src/api.js), so they never reach the
Edge Function. Managed usage counters therefore exclude BYOK activity by architectural
separation. This also means BYOK users bypass quota enforcement entirely and remain
responsible for their own Anthropic billing.
RLS
Users may SELECT their own rows (foundation for a future self-service usage page).
INSERT and UPDATE are restricted to the service role — no RLS policy grants clients
write access.
Future hybrid design
This aggregate model is sufficient for quota enforcement and monetisation reporting.
If full per-request audit trails are needed later (e.g. debugging, dispute resolution),
add an append-only ai_usage_log event table as an additive migration alongside
this one. The aggregate table’s structure would not need to change.
Entitlements — user_profiles (SUR-327, Phase A of SUR-235)
Phase A of SUR-235 generalises the SUR-92
SSoT pattern from “managed-AI usage” to the full v1.5 paygated-features matrix.
The resolver in supabase/functions/_shared/entitlements.ts → getResolvedEntitlements
is the single authority for what a user is entitled to. The
me-entitlements
Edge Function exposes it to the client; useEntitlements() reads from there.
No client-side cache, env var, or duplicate constants module may shadow the
resolver — the React app deliberately does not own the Free / Pro defaults.
Columns introduced or tightened in
migration 0017_entitlements_phase_a.sql:
| Column | Type | Notes |
|---|---|---|
user_tier | text | NOT NULL DEFAULT 'free', CHECK (user_tier IN ('free', 'pro')). Originally added by SUR-318; locked down by SUR-327. SSoT for which tier the user is on. The Stripe webhook (SUR-85) will write here on subscribe / cancel. |
tier_started_at | timestamptz | NOT NULL DEFAULT now(). Drives the SUR-328 lapsed-Pro 30-day grace window. |
image_storage_bytes_used | bigint | NOT NULL DEFAULT 0. Running counter maintained transactionally by the image-upload Edge Function via adjust_image_storage_bytes. Backfilled once from storage.objects in 0017. |
The resolver returns:
type Entitlements = { tier: 'free' | 'pro' managedAiMonthlyLimit: number | null // null = unlimited; else month_limit + override customIdeasLimit: number | null maxDevices: number | null imageStorageBytesLimit: number | null capabilities: { reDiscoverIdeas: boolean // the strong wall (SUR-235) importReadwise: boolean importKindle: boolean exportJson: boolean }}managedAiMonthlyLimit honours the SUR-92 column path (month_limit + valid
allocation_override) when populated; the other fields come from FREE_DEFAULTS
/ PRO_DEFAULTS in _shared/entitlements.ts.
adjust_image_storage_bytes(user_id, delta) → bigint
SECURITY DEFINER function added in 0017 so the
image-upload
Edge Function can atomically add or subtract bytes from
user_profiles.image_storage_bytes_used without a client-side
read-modify-write race. EXECUTE is granted to service_role only —
anon and authenticated are explicitly revoked, so a malicious caller
cannot inflate or zero a peer’s quota. The function is paired with the
post-upload increment in the image-upload handler; the Edge Function
also issues a compensating storage delete if the increment fails so the
counter never lags reality.
Phase A is observation-only
The four non-quota gates (re_discover_ideas, custom_ideas_limit,
max_devices, image_storage_bytes) are wired up client-side and emit
would_have_blocked PostHog events but do not alter behaviour.
SUR-262 observes for two weeks;
SUR-328 flips the predicates to
enforcing 4xx responses keyed to whichever capability the data validates as
the strong wall. The SUR-92 quota gate continues to enforce per-user
month_limit unchanged.
Stripe billing — user_profiles + stripe_webhook_events (SUR-85)
SUR-85 adds the data-plane plumbing
behind paid Pro subscriptions. The corresponding Edge Functions
(create-checkout-session, stripe-webhook, create-billing-portal-session)
are documented in System Architecture → Stripe billing flows.
user_profiles — billing columns
Migration
0019_stripe_billing.sql
adds four additive columns. All four are nullable; existing users carry
NULLs until their first checkout.
| Column | Type | Notes |
|---|---|---|
stripe_customer_id | text (nullable) | Stripe cus_… identifier. Populated lazily by create-checkout-session on first checkout; persists across cancel + resubscribe so the same Stripe customer is reused. Partial UNIQUE index WHERE NOT NULL prevents two profiles from binding to the same customer. |
stripe_subscription_id | text (nullable) | Stripe sub_… of the active or most-recently-cancelled subscription. Updated by the webhook on customer.subscription.created/updated; retained on cancellation for audit correlation. |
subscription_status | text (nullable) | Lifecycle status mirrored from Stripe. Explicit CHECK enumerates the seven values we expect: active, trialing, past_due, canceled, paused, incomplete, incomplete_expired. A future Stripe-side addition (e.g. unpaid) surfaces as a webhook 500 → Stripe retry → prompt for a follow-up migration; we deliberately do not accept unknown statuses silently. |
subscription_current_period_end | timestamptz (nullable) | End of the current Stripe billing period. Drives the lapsed-Pro 30-day grace window in the resolver (see below). |
user_tier (from SUR-327) remains the SSoT for billing intent. The
webhook flips it to 'pro' on the first transition into an active state
and never writes 'free' on cancellation — the deferred-grace check
in the resolver owns the pro→free decision.
stripe_webhook_events — idempotency ledger
CREATE TABLE stripe_webhook_events ( event_id text PRIMARY KEY, -- Stripe's evt_... id event_type text NOT NULL, received_at timestamptz NOT NULL DEFAULT now(), processed_at timestamptz, -- NULL until handler completes payload jsonb NOT NULL -- full Stripe event for audit);CREATE INDEX stripe_webhook_events_type_received_idx ON stripe_webhook_events (event_type, received_at DESC);Stripe retries deliveries on any non-2xx response, so the webhook handler must be idempotent. The dedup primitive is:
INSERT INTO stripe_webhook_events (event_id, event_type, payload)VALUES (...)ON CONFLICT (event_id) DO NOTHINGRETURNING event_id;Empty RETURNING means “already processed” — the handler short-circuits to
a 200 without re-applying the event. After dispatch completes the row’s
processed_at is stamped; rows where processed_at stays NULL flag a
handler that crashed mid-dispatch (Stripe will retry, but operators may
want to know).
Ledger reclaim — processed_at IS NULL. A row exists in two valid
states: processed_at IS NOT NULL (handler completed, retries should
short-circuit to a 200 deduplicated) and processed_at IS NULL (the
INSERT succeeded but the dispatch handler crashed before the marker was
written). On a Stripe retry, the upsert re-conflicts; the function then
re-reads processed_at. NULL → re-run dispatch (the prior delivery’s
state changes never landed; failing to retry would silently strand a
paid subscription event). NOT NULL → skip dispatch and ack.
Concurrency note. Two parallel deliveries of the same event.id (rare
but possible) could both observe processed_at IS NULL and both
re-dispatch. Each handler is idempotent at the user_profiles level
(UPDATE with stable inputs) so the duplicate work is harmless. A
stricter contention guard via SELECT … FOR UPDATE was deferred — bring
it back if telemetry surfaces a race in production.
RLS: the table is service-role-only. RLS is enabled with no policies
for authenticated or anon, and the GRANT/REVOKE matrix is locked
down at the table level so even an accidentally-permissive future policy
would still hit a “permission denied” before reaching RLS. The webhook
handler is the only intended writer and reader; payloads contain PII
(billing address, card last4) that has no business reaching authenticated
client code.
Resolution — deferred grace window
The 30-day lapsed-Pro grace is enforced in
getResolvedEntitlements:
effective_tier = user_tier -- usually if user_tier = 'pro' AND subscription_status = 'canceled': if subscription_current_period_end + 30 days < now() OR subscription_current_period_end IS NULL: effective_tier = 'free'Why deferred and not column-flip: if Stripe sends a renewal during the
grace window (uncommon but possible — e.g. user updates a failing card),
the user remains effectively Pro without any restoration write. We avoid
a daily pg_cron sweep entirely, the column stays as billing state, and
the resolver returns the effective tier. The column-based flip is still
possible later if observability tooling needs user_tier='free' to be
visible at the row level — all the necessary data is already in
subscription_status + subscription_current_period_end.
Defensive NULL handling: subscription_current_period_end IS NULL on
a canceled row is an invariant break (the webhook always writes it), but
the resolver treats it as “grace already expired” so an upstream bug can
never grant indefinite Pro by accident.
Prompt versioning — prompts + ai_usage_events (SUR-316)
Managed-AI system prompts (and their model / max_tokens) are no longer code
constants. They live in the service-role-only prompts table and are loaded per call
by anthropic-proxy, so a prompt can be hot-swapped or rolled back by row and every
call can be sliced by the version that produced it. See Tables → prompts
and ai_usage_events for the column catalog, and
the Prompt versioning runbook for
activation / rollback procedure.
Load path & fail-open
getPrompt(supabaseAdmin, name) (supabase/functions/anthropic-proxy/promptLoader.ts)
selects the active rows (where is_active) on the first call of a cold isolate and caches
them in-memory for ~5 minutes. The three names are transcribe, discover_canon,
and discover_with_custom; the with-custom body is a template carrying a {{customList}}
placeholder the call site fills at runtime (renderDiscoverWithCustom). On any DB read
error the loader returns the hardcoded SUR-303 constant for that name and marks it
fallback: true — the request never blocks (same posture as the Azure guardrails). A
fallback records prompt_version = 0 and surfaces _promptFallback: true in the proxy
response, which src/api.js / photoAdapter.js expose as promptFallback and the hooks
report as a prompt_fallback PostHog event (mirroring _failOpen → guardrail_fail_open).
Operational consequences: a new active version takes up to the cache TTL to propagate, and
independently per Edge isolate. A spike in prompt_version = 0 means the prompts table is
unreachable (loader is failing open), not a quality change.
Per-call telemetry — ai_usage_events
ai_usage_daily (the quota aggregate) and upsert_ai_usage are unchanged. Per-call
version telemetry instead lands in the append-only ai_usage_events table via the
insert_ai_usage_event RPC, written best-effort alongside the quota upsert inside
recordUsage — a failure there is logged and swallowed (analytics must never fail a user
request), so ai_usage_events may under-count vs ai_usage_daily.request_count under DB
pressure. The same fields also fire to the server-side PostHog event
managed_ai_call_succeeded (prompt_name / prompt_version / fail_open /
prompt_fallback) — the foundation for the “managed AI quality by prompt_version”
insight and the A/B follow-up.
Grants
prompts, ai_usage_events, and insert_ai_usage_event are service-role-only with
explicit REVOKE … FROM public, anon, authenticated (SUR-499 convention) and
search_path-pinned secdef functions (the 0011 mandate). Migration 0030 additionally
closes a pre-existing gap by revoking EXECUTE on upsert_ai_usage from anon /
authenticated — Supabase’s default privileges had left it callable directly via PostgREST.
Access control / RLS (SUR-363)
Every user-owned table is RLS-protected. Post-SUR-363 (migration
0021_drop_waitlist_rls_gate.sql,
2026-05-12), the gate is ownership-only — auth.uid() = user_id — across
books, notes, custom_ideas, and the note-images storage bucket.
The 0007 waitlist-EXISTS overlay that briefly layered on top of those
predicates is gone.
| Table | Operation(s) | Policy |
|---|---|---|
books | ALL | auth.uid() = user_id (USING + WITH CHECK). Renamed back to “users see own books” from the SUR-7 “approved users see own books” alias. |
notes | ALL | Same shape. Note ownership is user_id; soft deletes (deleted=1) still respect the predicate. |
custom_ideas | ALL | Same shape. |
storage.objects (note-images) | ALL | auth.uid()::text = (storage.foldername(name))[1] — first path segment is the user’s UUID, and the predicate is identical USING + WITH CHECK. |
user_profiles | SELECT | Authenticated user can read own row only. |
user_profiles | UPDATE | Authenticated user can update own row only, with column-level restriction enforced client-side: only display_name and profile_bio. month_limit / allocation_override* / user_tier / Stripe columns are service-role writes only. |
user_profiles | INSERT / DELETE | No policy. INSERTs come from the handle_new_auth_user() trigger (SECURITY DEFINER); DELETEs cascade from auth.users. |
ai_usage_daily | SELECT | Authenticated user reads own rows (foundation for a future usage page). |
ai_usage_daily | INSERT / UPDATE | Service-role only — written exclusively by anthropic-proxy via rpc('upsert_ai_usage', …). |
wrapped_key_blobs | ALL | auth.uid() = user_id. Plus the select_fresh_transfer_blob RPC enforces a server-side TTL on transfer-v1 blobs. |
stripe_webhook_events | ALL | RLS enabled with no policies for authenticated / anon. Service-role-only by table-level GRANT/REVOKE. The payload contains billing PII; no path to authenticated client code. |
waitlist_requests | ALL | Service-role-only (legacy table preserved for historical data; the approve-waitlist Edge Function and admin UI are the remaining readers — SUR-369 tracks final retirement). |
admin_audit_log | SELECT (read policy) | Admin-dashboard schema (0039, SUR-289). Read policy admins read audit log — authenticated SELECT gated on the caller’s own user_profiles.role = 'admin'. Append-only by GRANT, not policy: service_role bypasses RLS, so the forensic invariant is held at the privilege layer — service_role is granted SELECT, INSERT only (not ALL), withholding UPDATE/DELETE/TRUNCATE; preceded by an explicit REVOKE ALL because the SUR-499/0033 default ACL grants Dxtm (incl. TRUNCATE) on every new table. authenticated has SELECT only. |
fixed_costs | ALL | Admin spend panel (0039, SUR-289). RLS enabled, no policies. Service-role-only by table-level GRANT/REVOKE; edited via the SUR-377 Edge Function. |
cost_snapshot | ALL | Admin spend panel (0039, SUR-289). RLS enabled, no policies. Service-role-only by table-level GRANT/REVOKE; append-only variable-cost cache. |
The relaxation co-deployed with migration 0020’s
handle_new_auth_user() trigger — without 0021 a direct signup would
hold a valid JWT and a user_profiles row but silently 403 every
books / notes / custom_ideas read because no waitlist_requests
row exists for them. The two migrations land in the same PR / branch so
the regression window between “trigger no longer back-links” and “RLS
no longer requires the back-link” never opens in production.
Rollback. docs/runbooks/open-signup-rollback.md
covers the five-step ladder. Reverting 0021 is step 5 (rarely needed —
step 1 “disable signups in Supabase Studio” closes the door without
touching schema and is the default lever).
Confirmed / Assumption / Unknown
- Confirmed:
- Dexie schema versions and Supabase table definitions match field-for-field, including provenance data and tags (
src/db.js,supabase/migrations/0001_initial_schema.sql,0003_notes_add_source_ingest.sql). - Export/import, custom idea CRUD, and tag rename cascades operate entirely on Dexie tables with subsequent cloud writes (
src/hooks/useSettings.js). - Outbox queue plus
collapseOutboxItemsguarantee idempotent replay semantics (tests cover JSON merging).
- Dexie schema versions and Supabase table definitions match field-for-field, including provenance data and tags (
- Assumption:
Incremental sync filtering (Now implemented (SUR-739 / SUR-652):fetchSince) is designed but not actually used byfetchAllCloud.fetchAllCloudpulls each table viafetchSincekeyed on the serverchange_seqwatermark, paged, with per-table checkpoints inmeta.lastSeq:<table>— see the “Incremental pull — server watermark + LWW guard” section above.sourceIddedupe is expected to be used by future adapters; current adapters set it tonull, so duplicate detection is effectively manual.- Storage cleanup for deleted notes is likely manual because no delete API is called.
- Unknown:
- Schema consistency still depends on developers running
npm run check:schemaand applying migrations; the repo has detection tooling but no automated enforcement pipeline (scripts/check-schema.js,supabase/migrations/*.sql). - Data retention policies for Supabase Storage (especially for deleted/tombstoned notes) are not codified.
- Multi-user or shared libraries are not modeled; all tables enforce owner-only visibility with no sharing semantics.
- Schema consistency still depends on developers running