Skip to content

Ingest Model Abstraction

Let me work through this layer by layer.

Start with what you have today. Your current ingest is effectively hardcoded into two paths: take a photo (image → transcribe → tag → save) or type manually (text → tag → save). The logic for these paths lives in useNoteForm and api.js, tightly coupled to the UI components. That works fine for two paths. It won’t work for six.

The core insight: every ingest source does the same four things in the same order, but each one does them differently.

Acquire → Normalize → Deduplicate → Persist

  • Acquire: get the raw content (call an API, parse a file, receive a photo, accept typed text)
  • Normalize: transform it into a common shape your app understands
  • Deduplicate: check whether you already have this content
  • Persist: write to Dexie, queue for Supabase sync

Here’s how each source maps:

SourceAcquireNormalizeDedup key
PhotoCamera/file pickerTranscribe via Anthropic → textNone (each capture unique)
ManualText inputAlready textNone
ReadwiseAPI call → highlights JSONMap fields to note shapereadwise:{highlightId}
KindleFile upload → parse clippingsExtract entries → note shapekindle:{book}:{location}
EmailParse forwarded emailExtract body → note shapeemail:{messageId}
Web clipBrowser extension messageSelection + URL → note shapeweb:{url}:{selectionHash}

The common shape — what every source produces after normalization:

{
text: String, // the note content, always present
imageUrl: String|null, // only photo capture produces this
source: String, // 'manual'|'image'|'readwise'|'kindle'|'email'|'web_clip'
sourceId: String|null, // dedup key, null for manual/image
sourceMeta: Object, // flexible per source
ideas: String[], // empty at ingest, filled during tagging
status: String // 'draft'|'tagged'|'saved'
}

The sourceMeta object is where source-specific richness lives without polluting the core schema:

// Readwise
{ highlightId: "...", bookTitle: "...", author: "...", category: "books" }
// Kindle
{ bookTitle: "...", author: "...", location: "loc 1234", clippedAt: "..." }
// Web clip
{ url: "...", pageTitle: "...", selectedText: "..." }

The adapter pattern — one per source:

Each source implements the same interface:

{
sourceType: String,
acquire: (input) => RawItem[], // can return a batch
normalize: (rawItem) => NormalizedNote,
dedupeKey: (rawItem) => String|null
}

Then one orchestrator function runs the pipeline regardless of source:

async function ingest(adapter, input) {
const raw = await adapter.acquire(input)
const items = raw.map(adapter.normalize)
const unique = await filterDuplicates(items, adapter.dedupeKey)
return unique // hand off to tagging UI, then persist
}

Photo and manual are just adapters whose acquire step happens to involve the camera or a text field. Readwise is an adapter whose acquire calls an API and returns 50 items at once.

What this means for the data model:

Two additions to your Dexie schema and Supabase table:

  • sourceId — indexed, nullable, unique-where-not-null. This is your dedup index.
  • sourceMeta — JSONB in Supabase, plain object in Dexie. Not indexed, just stored.

Your existing source field stays exactly as-is. It’s the type discriminator. sourceId is the instance identifier.

What this means for sequencing:

  1. Add sourceId + sourceMeta to the schema (small migration, no UI change)
  2. Refactor photo and manual capture as the first two adapters (proving the pattern with code you already have)
  3. Build the Readwise adapter — it slots into the proven abstraction
  4. Every future source is just a new adapter file

The migration in step 2 is low-risk because existing notes simply have sourceId: null and sourceMeta: {}. Nothing breaks.

The key thing this buys you: when you eventually build Kindle, email, or web clipper, none of them require touching the pipeline, the sync logic, or the persistence layer. They’re each a single file that implements four functions.