SUR-242 — pre-deploy latency baseline

Captured against: main (last commit before feat/sur-242-guardrails deploys) Date: 2026-04-28 AC reference: SUR-242 issue body — “Added latency on P50 request is within +300 ms of pre-change baseline on both paths”

Method

Existing client emits api_timing PostHog events at three points in src/hooks/useNoteForm.js:

transcribe success path (line 210) — callTranscribeImage end-to-end
transcribe failure path (line 300) — same name, success: false
discover path (line 376) — callDiscoverIdeas end-to-end

Event shape: { api_name: 'callTranscribeImage' | 'callDiscoverIdeas', duration_ms: <int>, success: <bool> }.

Both timings are client-side end-to-end — they include the network round trip plus the Edge Function dispatch plus Anthropic plus the response. The Phase 2 guardrail layer will add the Azure call(s) inside the Edge Function span; the client api_timing will reflect the full delta.

Path A — historical query (preferred)

PostHog → Insights → + New → Trends.

Series: event = api_timing
Filter: success = true
Breakdown by: api_name
Aggregate: Median (P50) and 90th percentile (closest available; P95 isn’t an option in the UI dropdown — read it from the values panel or use HogQL)
Date range: last 30 days
Display: Table

Or HogQL directly (PostHog → SQL):

SELECT
  properties.api_name      AS api,
  count()                  AS n_calls,
  quantile(0.5)(toFloat(properties.duration_ms))  AS p50_ms,
  quantile(0.95)(toFloat(properties.duration_ms)) AS p95_ms
FROM events
WHERE event = 'api_timing'
  AND properties.success = true
  AND timestamp > now() - INTERVAL 30 DAY
GROUP BY api

If sample size on a path is < 20 events in the last 30 days, widen to 90 days or fall back to Path B.

Path B — controlled dev batch (alternative)

If Path A’s sample is too thin, switch to main locally and capture a fresh batch:

git checkout main
npm install
npm run dev

Then in the browser: sign in, take 5 photos of book pages (use real ones, not the spike spotlight-page.jpg), and trigger Discover Ideas on 5 typed notes (each 100–200 words, no PII content — the new client regex would fire on a card number even though the rest of Phase 3 is on a different branch… actually irrelevant on main since the PII regex doesn’t exist there). Each call emits api_timing.

Pull from PostHog with the same query as Path A, but narrow the date range to the last hour.

Baseline numbers

Fill in once you have the numbers. These become the comparison point for Phase 4.4 (post-deploy re-measurement).

Path	n_calls	P50 (ms)	P95 (ms)
`callTranscribeImage`	68	5482	9212
`callDiscoverIdeas`	61	2243	2829

Source: Path A Captured by: Deji Captured at: 28.04.2026 23:02

After deploy — comparison criteria

AC: P50 within +300 ms of baseline, measured on Azure S0 (F0 has cold-start variance — per the spike, F0 numbers don’t count toward the AC).

Budget vs expected delta

Path	Baseline P50	Azure calls per request	Expected delta (warm)	Post-deploy P50 target	Budget ceiling (+300 ms)	Headroom
`callTranscribeImage`	5482 ms	2 (post-flight shield + output moderate)	~170 ms	≤ 5652 ms	5782 ms	~130 ms
`callDiscoverIdeas`	2243 ms	1 (input shield only — moderate skipped)	~85 ms	≤ 2328 ms	2543 ms	~215 ms

Both paths fit within the AC if Azure S0 holds at the ~85 ms warm latency the spike measured. Discover has comfortable margin (~215 ms); transcribe is tighter (~130 ms) because it pays for two Azure calls.

Risk note — transcribe is the path to watch

Transcribe pays for two Azure calls (Spotlighting on the OCR’d text + output moderation). If Azure S0 sees 200 ms warm latency under bursty load, 2 × 200 = 400 ms — over budget. Mitigation if it bites:

The 5482 ms baseline P50 is dominated by Anthropic image OCR (which is the inherently expensive step). +400 ms on a 5.5 s call is +7%; users may not perceive it.
P95 baseline on transcribe is already 9212 ms. The +300 ms AC is on P50 only, so a P95 widening from 9212 to ~9500 ms is acceptable per the AC wording.
If P50 does go over budget post-deploy, easiest fix is to drop output moderation on transcribe too (Phase 2 review fixup already removed it from discover for the same “no signal worth the latency” reason). That recovers ~85 ms cleanly.

Reading the post-deploy numbers

When re-measuring in Phase 4.4:

Pull the same HogQL query, narrow date range to “since deploy”.
Compare P50 only against the AC. P95 is informational.
If transcribe P50 is over budget, look at the Edge Function logs first — Azure timeouts cause a single retry (Phase 2 review fixup) which adds 5+ seconds. A few retry events would skew P50 upward.
Discover output is constrained to a 102-item idea allow-list, so its baseline variance (P95/P50 = 1.26x) should stay tight post-deploy. If discover variance widens significantly, look for _failOpen: true markers in the response (Azure unavailable causing fail-open).

What “baseline” excludes

Azure latency (not deployed yet)
The client-side PII regex (Phase 3, ~1 ms on a 1–2 KB note — noise)
The client-side BottomSheet review prompt (only fires on PII-shaped content; user-driven, not part of the API timing span)
The new typed-error round trip on 422 guardrail_blocked (only fires on a block — separate measurement when we exercise that case manually in §5.7)