MCP tools
Carabase ships a Model Context Protocol server at /mcp/sse that exposes the host’s retrieval layer as callable tools. OpenClaw consumes this; any other MCP client (Claude Desktop, MCP Inspector, etc.) can too.
The canonical tool surface lives in the @carabase/mcp-server workspace package — single source of truth, DB-agnostic, importable by any consumer.
Read tools
Section titled “Read tools”carabase_search_semantic(query, k?, min_similarity?)
Section titled “carabase_search_semantic(query, k?, min_similarity?)”pgvector semantic search across artifacts. Returns the top k hits ranked by cosine similarity, with provenance.
{ "query": "what did Alice say about pricing?", "k": 5, "min_similarity": 0.7 }carabase_search_graph(start_entity, depth?, min_confidence?, source_kinds?)
Section titled “carabase_search_graph(start_entity, depth?, min_confidence?, source_kinds?)”Knowledge graph traversal from a named entity. Walks edges up to depth hops, filters by edge confidence + provenance.
{ "start_entity": "Alice Chen", "depth": 2, "min_confidence": 0.6, "source_kinds": ["extracted"] }source_kinds accepts "extracted" (directly observed), "inferred" (LLM-derived), and/or "ambiguous" (awaiting human review).
carabase_query_metadata(filters)
Section titled “carabase_query_metadata(filters)”Structured queries by entity name / folio id / date range. Use this for “what did I write last Tuesday?” type questions.
carabase_find_entity_candidates(text, hints?)
Section titled “carabase_find_entity_candidates(text, hints?)”Disambiguation lookup — given a name fragment (“Alice”), returns candidate canonical entities + their context.
carabase_route_and_execute(query)
Section titled “carabase_route_and_execute(query)”Cross-strategy router. Picks between semantic / graph / metadata based on the query shape. Useful as a default entry point when the agent doesn’t know which strategy fits.
carabase_query_timeseries(source, metric, range, aggregation?, group_by?)
Section titled “carabase_query_timeseries(source, metric, range, aggregation?, group_by?)”Substrate-table time-series queries (Phase 3 PR-B). Reads from connector
substrate tables (strava_activities, tidal_listens) declared via the
manifest’s substrateTable block. Returns pre-aggregated rows so the
agent doesn’t have to re-aggregate downstream. range (with both
from and to) is required — the dispatcher does not provide a
default window.
{ "source": "strava", "metric": "distance_meters", "aggregation": "sum", "group_by": "week", "range": { "from": "2026-01-01", "to": "2026-05-01" } }group_by accepts "day" | "week" | "month" for time bucketing OR the
metric’s categorical column (e.g. "activity_type" for Strava). When the
underlying substrate has been purged out from under the aggregate, the
result carries metadata.aggregate_status: "aggregate_only" so the agent
can present a freshness caveat.
carabase_resolve_mesh_uri(uri)
Section titled “carabase_resolve_mesh_uri(uri)”JIT mesh-resolver for hollow / metadata-only artifacts. Hydrates the
body from the source connector at query time, compacted to a 4_000-token
budget. Returns provenance (connector, uri, fetched_at).
The agent-facing tool exposes a single uri argument today. The
6-check ladder (connector capability → per-account flag → HMAC
confirmation_token → per-batch token cap → daily token budget →
per-workspace concurrency semaphore) lives in the host-side resolver
and currently runs only on text-tier hydrations; the visual-hydration
estimate → confirm → force flow that drives a cost-preview UX exists
in src/services/mesh-resolver.ts but is not yet exposed through the
MCP tool surface.
carabase_traverse_memory_network(anchor[], mode?, depth?, link_types?, min_confidence?, limit?)
Section titled “carabase_traverse_memory_network(anchor[], mode?, depth?, link_types?, min_confidence?, limit?)”Personalized PageRank over the memory-network typed graph (MAXIMAL Phase
6). Seeds at anchor[] (memory UUIDs only — keyword anchoring requires
a prior carabase_search_semantic call to resolve memory IDs first),
walks edges in memory_links filtered by closed-enum link_types and
min_confidence, returns the top-K memories ranked by stationary
distribution.
Modes: "pagerank" (default), "shortest_path", "neighbours". Depth
1–3. The trace header reports iter, converged, optional timed_out,
and fallback (when adjacency cap forces a BFS substitute). When every
returned memory’s source artifact has been purged, a [mesh-tier]
trailer fires so the agent cites memory facts directly.
carabase_verify_hypothesis(claim, limit?)
Section titled “carabase_verify_hypothesis(claim, limit?)”Corroborate vs contradict a natural-language claim. Returns:
{ "verdict": "corroborated" | "contradicted" | "mixed" | "inconclusive", "corroborated_by": [...], "contradicted_by": [...], "considered": 8}Deterministic heuristic (no LLM required for v0.1) — see Knowledge graph → Hypothesis verification.
Write tools
Section titled “Write tools”commit_to_folio(folioName, content, extraMetadata?)
Section titled “commit_to_folio(folioName, content, extraMetadata?)”Push LLM-generated prose into a folio. Splits into chunks, embeds each, links via a single commit row. Used by the dream injector and the agent’s research-and-summarize flows.
carabase_attach_file (Agentic Flows v1)
Section titled “carabase_attach_file (Agentic Flows v1)”Available only in the in-process agent-runtime path (not over /mcp/sse). Attaches a file artifact to the originating logCard of an agentic flow run. Used by Claude / Codex providers in v0.2.
External MCP servers (Lane 1)
Section titled “External MCP servers (Lane 1)”The agent runtime can additionally pull tools from per-workspace
external MCP servers registered in the external_mcp_servers table.
At session start the runtime queries each enabled server, namespaces
its tool list, and lays the discovered tools alongside the canonical
Carabase surface. This is how the Beeper Lane C MCP-client integration
exposes Beeper Desktop’s local /v0/mcp endpoint to the agent — see
the Beeper OAuth setup guide.
Resources
Section titled “Resources”The MCP server also exposes resources (URIs the agent can call resources/read on):
| URI Pattern | What it returns |
|---|---|
carabase://artifact/{id} | Lazy artifact body fetch. Tool results return URIs of this form so the agent only fetches bodies it actually needs |
folio://{id}/readme | Folio name + README summary |
Hint repair (Doctor-RAG)
Section titled “Hint repair (Doctor-RAG)”When a Carabase tool returns 0 results or isError: true, the package appends a structured [hint: ...] + [trace: ...] trailer to the response so the agent can course-correct without resetting its chain of thought. Hints are deterministic v1 (no LLM); the LLM-driven generator slots in via the Sampler interface in v0.2.
Format-aware compaction
Section titled “Format-aware compaction”Artifact body responses through carabase://artifact/{id} can be auto-compacted to fit a token budget. CSVs preserve the header + predicate-pushdown rows; markdown keeps headings + matching paragraphs; PDFs split by \f and rank pages by query-token frequency. Opt-in via CarabaseToolContext.compaction = { maxTokens } — off by default in the host shim.
Adding a new retrieval tool
Section titled “Adding a new retrieval tool”Goes in @carabase/mcp-server, not in the host. The host’s src/services/mcp-server.ts is a thin shim that mounts the package + registers legacy aliases for backwards compat (search_semantic → carabase_search_semantic, etc.).
Workflow:
- Add the strategy to
@carabase/retrievalif it’s a new query pattern - Add the tool definition + handler to
packages/mcp-server/src/tools.ts - Add a hint generator in
packages/mcp-server/src/hints/for the empty-result + error cases - Add an assertion in
pnpm smoke:mcpso the tool is exercised in the in-memory adapter - Add a trajectory in
packages/mcp-server/eval/baselines/agent-trajectories.snapshot.jsonso the nightly agent-eval covers it