Safari (Reading List + history fallback)
The Safari connector is an edge harvester that pushes Reading List adds,
Bookmarks snapshots, and (optionally) browsing history from a macOS
device into Carabase. It’s the “fallback” path: works without a browser
extension installed by reading directly from ~/Library/Safari/.
A future browser-extension path (Phase 4 streaming push) lands per-event
streams into the same safari_reading_list substrate table — the
Reading List/history fallback covers users who haven’t installed the
extension yet, or browser sessions where the extension isn’t running.
Highlights — Lane A part 1 (live in v0.1)
Section titled “Highlights — Lane A part 1 (live in v0.1)”The first Safari PR shipped a focused subset of the spec. Reading List adds + Bookmarks-snapshot change-detection are live; per-visit history substrate writes + per-domain history rollups + bookmarks-as-folio-readme land in Lane A part 2.
- Reading List items land as substrate rows. Every Reading List add
the desktop pushes lands in
safari_reading_listkeyed on(workspace_id, url_hash)withscrape_status = 'pending'andartifact_id = NULL. The downstreamsafari-scrapeworker that hydrates the body into a Tier-0 artifact + stampsartifact_idlands in part 2 — until then, Reading List rows persist for replay-safety but no searchable artifact is produced. - History is counted, not (yet) persisted per-visit. The desktop
pushes the history batch; the route advances
safari_sync_state. last_history_synced_atto the newest visit timestamp it sees and returns. Per-visit rows + Tier 2 daily rollup writes are deferred to part 2. - Bookmarks snapshot is hashed. The desktop sends a bookmarks
{ source, items[], sha256 }blob; the route stores the sha256 + source onsafari_sync_stateso subsequent syncs short-circuit when the bookmarks haven’t changed. Folio-readme writes for the bookmarks list land in part 2. - Workspace-scoped sync state.
safari_sync_stateis keyed byworkspace_id(PK), not per-device. Multi-device pushes use atomicGREATEST(existing, candidate)upserts so an older sync from device B can’t rewind a newer cursor from device A.
The Safari edge harvester ships in the Desktop client (separate repo).
Pair via the standard
edge-harvester pairing flow using
connector: "safari". The push endpoint is
POST /api/v1/safari/sync.
Filter shape
Section titled “Filter shape”SafariFilters lives in src/types/sync-rules.ts. The fields below are
declared on the type today; only those marked (part 1) are enforced
by the part-1 sync route. The rest activate when Lane A part 2 lands.
excludeDomains/includeDomains— host substring match, case-insensitive. (part 2)dailyTopN— top-N per-domain visit count surfaced in the Tier 2 daily rollup body (default 10). (part 2)minDailyVisits— drop daily rollups with fewer than this many total visits (default 3, suppresses noise). (part 2)minVisitCount— per-URL minimum visit count before it appears at all (default 1; set to 2+ to filter out single-tap accidents). (part 2)includeReadingList/includeBookmarks— gate Reading List + Bookmarks pipelines. (part 1 — Reading List honoured today; bookmarks routing in part 2)scrapeTopVisits— opt-in body hydration for the top-N visited URLs in the daily rollup. (part 2)promoteByEntityDomain— Tier-0 promote per-visit when the URL’s host matches an existing entity’surl_domain. (part 2)
Substrate columns
Section titled “Substrate columns”safari_reading_list — per-URL row keyed (workspace_id, url_hash) UNIQUE with url, title, added_at, read_at, scrape_status
(closed enum pending | ready | thin | failed), and artifact_id
(stamped by the scrape worker on success). See the
database schema reference.