Knowledge graph
Carabase maintains a knowledge graph of every entity you’ve mentioned — people, projects, organizations, concepts, tools, topics, events. The graph is incrementally extracted from your daily notes by the harvest pipeline, then exposed to the agent through the MCP server so it can do things like “what did Alice ship for the fundraise project last quarter?” without you having to manually link anything.
Entities
Section titled “Entities”A row in the entities table:
{ id: uuid, workspace_id: uuid, name: "Alice Chen", type: "person", // person, project, concept, organization, tool, topic, event is_canonical: true, // false for aliases that point at the canonical row canonical_id: null, // the parent canonical entity if this is an alias metadata: { concept_role: "self" | "primary_org" | null, // marks special roots fixture_id: "...", // debug-only, present on seeded test data ... }}The concept_role field marks two special entities per workspace: the self (the user the workspace is about) and the primary org (their employer). Both are surfaced in agent context to anchor pronoun resolution and “we” references.
Aliases
Section titled “Aliases”People go by multiple names. The graph supports this with an alias mechanism: a non-canonical entity row points at a canonical one via canonical_id. The corpus curator (a nightly worker) proposes alias merges; you accept or reject them through the curation_suggestions UI.
A row in the edges table:
{ id, workspace_id, source_id, target_id, // entity ids type: "works_at", // free-form predicate source_kind: "extracted" | "inferred" | "ambiguous", confidence: 0.85, // [0, 1] provenance: { ... } // jsonb bag with the producing generator's evidence}The three provenance fields together let downstream consumers ask trust-aware questions:
carabase_search_graphexposesmin_confidence+source_kindsfilters that push down to SQL- The default formatter annotates non-extracted edges with
[inferred 0.62]-style suffixes when surfacing them to the agent - The curator can produce
'inferred'edges with low confidence; the agent sees them but knows to weigh them less than first-hand observations
Adding a new edge producer? Set both source_kind and confidence explicitly. A forgotten value defaults to 'extracted' / 1.0, which silently promotes junk into high-trust territory.
How the graph gets built
Section titled “How the graph gets built”- Daily notes accumulate logCards (from your typing, from connectors, from agentic flows)
- The harvest pipeline reads each logCard, calls an LLM (the
utility-highmodel role) to extract{ entities[], relationships[] } - Entities are upserted (with alias resolution); edges are inserted with
source_kind: "extracted"andconfidence: 1.0 - The memory-graph bridge translates Mem0 facts into edges with appropriate provenance
- Nightly, the corpus curator walks the graph and suggests alias merges, role enrichment, edge inferences, and stale-entity cleanup — all as
curation_suggestionsfor you to accept or reject
How the agent queries it
Section titled “How the agent queries it”Through the MCP tools that ship in @carabase/mcp-server:
carabase_search_semantic(query, k?)— pgvector semantic search across artifactscarabase_search_graph(start_entity, depth?, min_confidence?, source_kinds?)— graph traversal from a named entitycarabase_query_metadata(filters)— structured queries by entity name / folio / datecarabase_find_entity_candidates(text)— disambiguation lookupcarabase_route_and_execute(query)— picks the right strategy based on the query shapecarabase_verify_hypothesis(claim)— semantic search + heuristic NLI to corroborate or contradict
Each result includes provenance + confidence so the agent can phrase its answer with appropriate hedging.
Hypothesis verification (FLARE)
Section titled “Hypothesis verification (FLARE)”verifyHypothesis is a deterministic, no-LLM-required heuristic for resolving factual uncertainty before the agent commits to an answer. Given a natural-language claim, it:
- Runs a semantic search to find supporting passages
- Tokenizes the claim (minus stopwords)
- Measures content-token overlap with each hit
- Flags a hit as contradictory when any sentence containing a claim-token also contains a negation cue (
not,no,never,denies,refuted,false,didn't,isn't,wasn't,won't,without, …) - Returns
{ verdict: 'corroborated' | 'contradicted' | 'mixed' | 'inconclusive', corroborated_by, contradicted_by, considered }
The agent calls this before answering questions of the form “did X happen?” — letting it correct itself instead of confidently stating something the corpus disagrees with.