Knowledge graph

Carabase maintains a knowledge graph of every entity you’ve mentioned — people, projects, organizations, concepts, tools, topics, events. The graph is incrementally extracted from your daily notes by the harvest pipeline, then exposed to the agent through the MCP server so it can do things like “what did Alice ship for the fundraise project last quarter?” without you having to manually link anything.

Entities

A row in the entities table:

{
  id: uuid,
  workspace_id: uuid,
  name: "Alice Chen",
  type: "person",       // person, project, concept, organization, tool, topic, event
  is_canonical: true,   // false for aliases that point at the canonical row
  canonical_id: null,   // the parent canonical entity if this is an alias
  metadata: {
    concept_role: "self" | "primary_org" | null,  // marks special roots
    fixture_id: "...",  // debug-only, present on seeded test data
    ...
  }
}

The concept_role field marks two special entities per workspace: the self (the user the workspace is about) and the primary org (their employer). Both are surfaced in agent context to anchor pronoun resolution and “we” references.

Aliases

People go by multiple names. The graph supports this with an alias mechanism: a non-canonical entity row points at a canonical one via canonical_id. The corpus curator (a nightly worker) proposes alias merges; you accept or reject them through the curation_suggestions UI.

Edges

A row in the edges table:

{
  id, workspace_id,
  source_id, target_id,    // entity ids
  type: "works_at",        // free-form predicate
  source_kind: "extracted" | "inferred" | "ambiguous",
  confidence: 0.85,        // [0, 1]
  provenance: { ... }      // jsonb bag with the producing generator's evidence
}

The three provenance fields together let downstream consumers ask trust-aware questions:

carabase_search_graph exposes min_confidence + source_kinds filters that push down to SQL
The default formatter annotates non-extracted edges with [inferred 0.62]-style suffixes when surfacing them to the agent
The curator can produce 'inferred' edges with low confidence; the agent sees them but knows to weigh them less than first-hand observations

Adding a new edge producer? Set both source_kind and confidence explicitly. A forgotten value defaults to 'extracted' / 1.0, which silently promotes junk into high-trust territory.

How the graph gets built

Daily notes accumulate logCards (from your typing, from connectors, from agentic flows)
The harvest pipeline reads each logCard, calls an LLM (the utility-high model role) to extract { entities[], relationships[] }
Entities are upserted (with alias resolution); edges are inserted with source_kind: "extracted" and confidence: 1.0
The memory-graph bridge translates Mem0 facts into edges with appropriate provenance
Nightly, the corpus curator walks the graph and suggests alias merges, role enrichment, edge inferences, and stale-entity cleanup — all as curation_suggestions for you to accept or reject

How the agent queries it

Through the MCP tools that ship in @carabase/mcp-server:

carabase_search_semantic(query, k?) — pgvector semantic search across artifacts
carabase_search_graph(start_entity, depth?, min_confidence?, source_kinds?) — graph traversal from a named entity
carabase_query_metadata(filters) — structured queries by entity name / folio / date
carabase_find_entity_candidates(text) — disambiguation lookup
carabase_route_and_execute(query) — picks the right strategy based on the query shape
carabase_verify_hypothesis(claim) — semantic search + heuristic NLI to corroborate or contradict

Each result includes provenance + confidence so the agent can phrase its answer with appropriate hedging.

Hypothesis verification (FLARE)

verifyHypothesis is a deterministic, no-LLM-required heuristic for resolving factual uncertainty before the agent commits to an answer. Given a natural-language claim, it:

Runs a semantic search to find supporting passages
Tokenizes the claim (minus stopwords)
Measures content-token overlap with each hit
Flags a hit as contradictory when any sentence containing a claim-token also contains a negation cue (not, no, never, denies, refuted, false, didn't, isn't, wasn't, won't, without, …)
Returns { verdict: 'corroborated' | 'contradicted' | 'mixed' | 'inconclusive', corroborated_by, contradicted_by, considered }

The agent calls this before answering questions of the form “did X happen?” — letting it correct itself instead of confidently stating something the corpus disagrees with.