SUR-242 — pre-deploy latency baseline
SUR-242 — pre-deploy latency baseline
Captured against: main (last commit before feat/sur-242-guardrails deploys)
Date: 2026-04-28
AC reference: SUR-242 issue body — “Added latency on P50 request is within +300 ms of pre-change baseline on both paths”
Method
Existing client emits api_timing PostHog events at three points in src/hooks/useNoteForm.js:
- transcribe success path (line 210) —
callTranscribeImageend-to-end - transcribe failure path (line 300) — same name,
success: false - discover path (line 376) —
callDiscoverIdeasend-to-end
Event shape: { api_name: 'callTranscribeImage' | 'callDiscoverIdeas', duration_ms: <int>, success: <bool> }.
Both timings are client-side end-to-end — they include the network round trip plus the Edge Function dispatch plus Anthropic plus the response. The Phase 2 guardrail layer will add the Azure call(s) inside the Edge Function span; the client api_timing will reflect the full delta.
Path A — historical query (preferred)
PostHog → Insights → + New → Trends.
- Series: event =
api_timing - Filter:
success = true - Breakdown by:
api_name - Aggregate:
Median(P50) and90th percentile(closest available; P95 isn’t an option in the UI dropdown — read it from the values panel or use HogQL) - Date range: last 30 days
- Display: Table
Or HogQL directly (PostHog → SQL):
SELECT properties.api_name AS api, count() AS n_calls, quantile(0.5)(toFloat(properties.duration_ms)) AS p50_ms, quantile(0.95)(toFloat(properties.duration_ms)) AS p95_msFROM eventsWHERE event = 'api_timing' AND properties.success = true AND timestamp > now() - INTERVAL 30 DAYGROUP BY apiIf sample size on a path is < 20 events in the last 30 days, widen to 90 days or fall back to Path B.
Path B — controlled dev batch (alternative)
If Path A’s sample is too thin, switch to main locally and capture a fresh batch:
git checkout mainnpm installnpm run devThen in the browser: sign in, take 5 photos of book pages (use real ones, not the spike spotlight-page.jpg), and trigger Discover Ideas on 5 typed notes (each 100–200 words, no PII content — the new client regex would fire on a card number even though the rest of Phase 3 is on a different branch… actually irrelevant on main since the PII regex doesn’t exist there). Each call emits api_timing.
Pull from PostHog with the same query as Path A, but narrow the date range to the last hour.
Baseline numbers
Fill in once you have the numbers. These become the comparison point for Phase 4.4 (post-deploy re-measurement).
| Path | n_calls | P50 (ms) | P95 (ms) |
|---|---|---|---|
callTranscribeImage | 68 | 5482 | 9212 |
callDiscoverIdeas | 61 | 2243 | 2829 |
Source: Path A Captured by: Deji Captured at: 28.04.2026 23:02
After deploy — comparison criteria
AC: P50 within +300 ms of baseline, measured on Azure S0 (F0 has cold-start variance — per the spike, F0 numbers don’t count toward the AC).
Budget vs expected delta
| Path | Baseline P50 | Azure calls per request | Expected delta (warm) | Post-deploy P50 target | Budget ceiling (+300 ms) | Headroom |
|---|---|---|---|---|---|---|
callTranscribeImage | 5482 ms | 2 (post-flight shield + output moderate) | ~170 ms | ≤ 5652 ms | 5782 ms | ~130 ms |
callDiscoverIdeas | 2243 ms | 1 (input shield only — moderate skipped) | ~85 ms | ≤ 2328 ms | 2543 ms | ~215 ms |
Both paths fit within the AC if Azure S0 holds at the ~85 ms warm latency the spike measured. Discover has comfortable margin (~215 ms); transcribe is tighter (~130 ms) because it pays for two Azure calls.
Risk note — transcribe is the path to watch
Transcribe pays for two Azure calls (Spotlighting on the OCR’d text + output moderation). If Azure S0 sees 200 ms warm latency under bursty load, 2 × 200 = 400 ms — over budget. Mitigation if it bites:
- The 5482 ms baseline P50 is dominated by Anthropic image OCR (which is the inherently expensive step). +400 ms on a 5.5 s call is +7%; users may not perceive it.
- P95 baseline on transcribe is already 9212 ms. The +300 ms AC is on P50 only, so a P95 widening from 9212 to ~9500 ms is acceptable per the AC wording.
- If P50 does go over budget post-deploy, easiest fix is to drop output moderation on transcribe too (Phase 2 review fixup already removed it from discover for the same “no signal worth the latency” reason). That recovers ~85 ms cleanly.
Reading the post-deploy numbers
When re-measuring in Phase 4.4:
- Pull the same HogQL query, narrow date range to “since deploy”.
- Compare P50 only against the AC. P95 is informational.
- If transcribe P50 is over budget, look at the Edge Function logs first — Azure timeouts cause a single retry (Phase 2 review fixup) which adds 5+ seconds. A few retry events would skew P50 upward.
- Discover output is constrained to a 102-item idea allow-list, so its baseline variance (P95/P50 = 1.26x) should stay tight post-deploy. If discover variance widens significantly, look for
_failOpen: truemarkers in the response (Azure unavailable causing fail-open).
What “baseline” excludes
- Azure latency (not deployed yet)
- The client-side PII regex (Phase 3, ~1 ms on a 1–2 KB note — noise)
- The client-side BottomSheet review prompt (only fires on PII-shaped content; user-driven, not part of the API timing span)
- The new typed-error round trip on 422 guardrail_blocked (only fires on a block — separate measurement when we exercise that case manually in §5.7)