Ingest Model Abstraction

Let me work through this layer by layer.

Start with what you have today. Your current ingest is effectively hardcoded into two paths: take a photo (image → transcribe → tag → save) or type manually (text → tag → save). The logic for these paths lives in useNoteForm and api.js, tightly coupled to the UI components. That works fine for two paths. It won’t work for six.

The core insight: every ingest source does the same four things in the same order, but each one does them differently.

Acquire → Normalize → Deduplicate → Persist

Acquire: get the raw content (call an API, parse a file, receive a photo, accept typed text)
Normalize: transform it into a common shape your app understands
Deduplicate: check whether you already have this content
Persist: write to Dexie, queue for Supabase sync

Here’s how each source maps:

Source	Acquire	Normalize	Dedup key
Photo	Camera/file picker	Transcribe via Anthropic → text	None (each capture unique)
Manual	Text input	Already text	None
Readwise	API call → highlights JSON	Map fields to note shape	`readwise:{highlightId}`
Kindle	File upload → parse clippings	Extract entries → note shape	`kindle:{book}:{location}`
Email	Parse forwarded email	Extract body → note shape	`email:{messageId}`
Web clip	Browser extension message	Selection + URL → note shape	`web:{url}:{selectionHash}`

The common shape — what every source produces after normalization:

{
  text:        String,       // the note content, always present
  imageUrl:    String|null,  // only photo capture produces this
  source:      String,       // 'manual'|'image'|'readwise'|'kindle'|'email'|'web_clip'
  sourceId:    String|null,  // dedup key, null for manual/image
  sourceMeta:  Object,       // flexible per source
  ideas:       String[],     // empty at ingest, filled during tagging
  status:      String        // 'draft'|'tagged'|'saved'
}

The sourceMeta object is where source-specific richness lives without polluting the core schema:

// Readwise
{ highlightId: "...", bookTitle: "...", author: "...", category: "books" }

// Kindle
{ bookTitle: "...", author: "...", location: "loc 1234", clippedAt: "..." }

// Web clip
{ url: "...", pageTitle: "...", selectedText: "..." }

The adapter pattern — one per source:

Each source implements the same interface:

{
  sourceType:  String,
  acquire:     (input) => RawItem[],       // can return a batch
  normalize:   (rawItem) => NormalizedNote,
  dedupeKey:   (rawItem) => String|null
}

Then one orchestrator function runs the pipeline regardless of source:

async function ingest(adapter, input) {
  const raw = await adapter.acquire(input)
  const items = raw.map(adapter.normalize)
  const unique = await filterDuplicates(items, adapter.dedupeKey)
  return unique  // hand off to tagging UI, then persist
}

Photo and manual are just adapters whose acquire step happens to involve the camera or a text field. Readwise is an adapter whose acquire calls an API and returns 50 items at once.

What this means for the data model:

Two additions to your Dexie schema and Supabase table:

sourceId — indexed, nullable, unique-where-not-null. This is your dedup index.
sourceMeta — JSONB in Supabase, plain object in Dexie. Not indexed, just stored.

Your existing source field stays exactly as-is. It’s the type discriminator. sourceId is the instance identifier.

What this means for sequencing:

Add sourceId + sourceMeta to the schema (small migration, no UI change)
Refactor photo and manual capture as the first two adapters (proving the pattern with code you already have)
Build the Readwise adapter — it slots into the proven abstraction
Every future source is just a new adapter file

The migration in step 2 is low-risk because existing notes simply have sourceId: null and sourceMeta: {}. Nothing breaks.

The key thing this buys you: when you eventually build Kindle, email, or web clipper, none of them require touching the pipeline, the sync logic, or the persistence layer. They’re each a single file that implements four functions.