Ingest Model Abstraction
Let me work through this layer by layer.
Start with what you have today. Your current ingest is effectively hardcoded into two paths: take a photo (image → transcribe → tag → save) or type manually (text → tag → save). The logic for these paths lives in useNoteForm and api.js, tightly coupled to the UI components. That works fine for two paths. It won’t work for six.
The core insight: every ingest source does the same four things in the same order, but each one does them differently.
Acquire → Normalize → Deduplicate → Persist
- Acquire: get the raw content (call an API, parse a file, receive a photo, accept typed text)
- Normalize: transform it into a common shape your app understands
- Deduplicate: check whether you already have this content
- Persist: write to Dexie, queue for Supabase sync
Here’s how each source maps:
| Source | Acquire | Normalize | Dedup key |
|---|---|---|---|
| Photo | Camera/file picker | Transcribe via Anthropic → text | None (each capture unique) |
| Manual | Text input | Already text | None |
| Readwise | API call → highlights JSON | Map fields to note shape | readwise:{highlightId} |
| Kindle | File upload → parse clippings | Extract entries → note shape | kindle:{book}:{location} |
| Parse forwarded email | Extract body → note shape | email:{messageId} | |
| Web clip | Browser extension message | Selection + URL → note shape | web:{url}:{selectionHash} |
The common shape — what every source produces after normalization:
{ text: String, // the note content, always present imageUrl: String|null, // only photo capture produces this source: String, // 'manual'|'image'|'readwise'|'kindle'|'email'|'web_clip' sourceId: String|null, // dedup key, null for manual/image sourceMeta: Object, // flexible per source ideas: String[], // empty at ingest, filled during tagging status: String // 'draft'|'tagged'|'saved'}The sourceMeta object is where source-specific richness lives without polluting the core schema:
// Readwise{ highlightId: "...", bookTitle: "...", author: "...", category: "books" }
// Kindle{ bookTitle: "...", author: "...", location: "loc 1234", clippedAt: "..." }
// Web clip{ url: "...", pageTitle: "...", selectedText: "..." }The adapter pattern — one per source:
Each source implements the same interface:
{ sourceType: String, acquire: (input) => RawItem[], // can return a batch normalize: (rawItem) => NormalizedNote, dedupeKey: (rawItem) => String|null}Then one orchestrator function runs the pipeline regardless of source:
async function ingest(adapter, input) { const raw = await adapter.acquire(input) const items = raw.map(adapter.normalize) const unique = await filterDuplicates(items, adapter.dedupeKey) return unique // hand off to tagging UI, then persist}Photo and manual are just adapters whose acquire step happens to involve the camera or a text field. Readwise is an adapter whose acquire calls an API and returns 50 items at once.
What this means for the data model:
Two additions to your Dexie schema and Supabase table:
sourceId— indexed, nullable, unique-where-not-null. This is your dedup index.sourceMeta— JSONB in Supabase, plain object in Dexie. Not indexed, just stored.
Your existing source field stays exactly as-is. It’s the type discriminator. sourceId is the instance identifier.
What this means for sequencing:
- Add
sourceId+sourceMetato the schema (small migration, no UI change) - Refactor photo and manual capture as the first two adapters (proving the pattern with code you already have)
- Build the Readwise adapter — it slots into the proven abstraction
- Every future source is just a new adapter file
The migration in step 2 is low-risk because existing notes simply have sourceId: null and sourceMeta: {}. Nothing breaks.
The key thing this buys you: when you eventually build Kindle, email, or web clipper, none of them require touching the pipeline, the sync logic, or the persistence layer. They’re each a single file that implements four functions.