Status: Stable — The four-tier memory system, hybrid search, and wiki compilation are production-ready. P2P Sync is experimental.

Memory System

FERAL’s memory is a four-tier architecture stored in a single SQLite database (~/.feral/memory.db). Each tier serves a different retention and retrieval pattern. On top of the tiers sit hybrid search, diversity reranking, session compaction, wiki compilation, and P2P sync.

Four Memory Tiers

Working Memory

In-RAM context for the current session. Holds the conversation history, tool results, and scratch state. Cleared when the session ends.

session.working_memory.append({
    "role": "user",
    "content": "What's the weather in NYC?",
})

Working memory is capped at a configurable token budget. When it overflows, the oldest messages are compacted into an episode (see Session Compaction).

Episodic Memory

Auto-generated summaries of past conversations. Each episode captures the key facts, decisions, and outcomes from a session.

-- Schema
CREATE TABLE episodes (
    id TEXT PRIMARY KEY,
    session_id TEXT,
    summary TEXT,
    entities TEXT,        -- JSON array of extracted entities
    created_at REAL,
    embedding BLOB        -- 384-dim float32 vector
);
CREATE VIRTUAL TABLE episodes_fts USING fts5(summary, entities);

Episodes are created automatically when a session ends or when working memory overflows.

Semantic Memory / Knowledge Graph

Persistent facts stored as subject-predicate-object triples. Extracted automatically from conversations or added explicitly via “remember X” commands.

CREATE TABLE knowledge_graph (
    id TEXT PRIMARY KEY,
    subject TEXT,
    predicate TEXT,
    object TEXT,
    confidence REAL,
    source_episode TEXT,
    created_at REAL
);

# User says: "Remember that my doctor's name is Dr. Chen"
# Extracted triple:
# subject=user, predicate=doctor_name, object="Dr. Chen", confidence=0.95

Execution Log

An append-only log of every tool invocation, including arguments, results, latency, and success/failure status.

CREATE TABLE execution_log (
    id TEXT PRIMARY KEY,
    session_id TEXT,
    tool_name TEXT,
    args TEXT,           -- JSON
    result TEXT,         -- JSON
    latency_ms INTEGER,
    success BOOLEAN,
    created_at REAL
);

Useful for debugging, skill auto-generation, and auditing what the agent actually did.

Hybrid Search

Memory retrieval combines SQLite FTS5 (keyword) and vector similarity (semantic) to get the best of both worlds.

from feral_core.memory import MemoryStore

store = MemoryStore()

results = await store.search(
    query="Dr. Chen appointment",
    top_k=10,
    strategy="hybrid",    # "fts", "vector", or "hybrid"
    alpha=0.6,            # weight: 0.0 = pure FTS, 1.0 = pure vector
)

How hybrid scoring works:

FTS5 returns top-N by BM25 score, normalized to [0, 1].
Vector search returns top-N by cosine similarity, already in [0, 1].
Scores are combined: final = alpha * vector_score + (1 - alpha) * fts_score.
Results are merged and deduplicated by ID.

Vector embeddings use all-MiniLM-L6-v2 (384 dimensions) by default, computed locally via sentence-transformers. For larger deployments, swap in OpenAI text-embedding-3-small via config.

MMR Diversity Reranking

After hybrid search, Maximal Marginal Relevance reranks results to reduce redundancy. Without MMR, the top-5 results might all describe the same event from different angles.

results = await store.search(
    query="meeting notes",
    top_k=5,
    mmr=True,
    mmr_lambda=0.7,   # 1.0 = pure relevance, 0.0 = pure diversity
)

The algorithm iteratively selects the result that maximizes lambda * relevance - (1 - lambda) * max_similarity_to_already_selected.

Session Compaction

When working memory exceeds its token budget, the compactor summarizes older messages into an episode and evicts them from the active context.

// ~/.feral/settings.json — "memory" section
{
  "memory": {
    "working_memory_budget": 8000,
    "compaction_trigger": 0.85,
    "compaction_strategy": "summarize"
  }
}

The compaction flow:

Select messages beyond the budget.
Prompt the LLM to summarize them into a structured episode.
Insert the episode into episodes table with embedding.
Replace the compacted messages with a system note: [Session compacted — N messages summarized].

Wiki Compilation

The Memory Wiki compiles episodes, notes, and knowledge graph entries into durable, human-readable wiki pages organized by topic. Compilation runs automatically on a schedule. To trigger a manual compile or browse pages, use the HTTP API or the WebUI Memory → Wiki tab:

# Compile now (HTTP)
curl -X POST http://localhost:9090/api/wiki/compile

# List wiki pages
curl http://localhost:9090/api/wiki/list

# Read a single page
curl http://localhost:9090/api/wiki/page/health

Wiki pages are stored in ~/.feral/wiki/ as Markdown files with YAML frontmatter tracking provenance:

---
topic: health
sources:
  - episode:abc123
  - kg:triple_456
last_compiled: 2025-06-15T10:30:00Z
---
# Health

- Doctor: Dr. Chen (added 2025-03-10)
- Blood type: O+ (added 2025-01-22)
- Allergies: penicillin (added 2025-04-05)

Compilation runs automatically on a schedule or on-demand. New facts merge into existing pages; conflicts are flagged for user review.

P2P Sync

For multi-device setups (laptop + phone + home server), FERAL supports peer-to-peer memory synchronization over the /sync WebSocket endpoint.

// ~/.feral/settings.json — "sync" section
{
  "sync": {
    "enabled": true,
    "peers": [
      "ws://homeserver.local:9090/sync",
      "ws://phone.local:9090/sync"
    ],
    "conflict_resolution": "last_write_wins"
  }
}

Sync uses HLC (Hybrid Logical Clock) timestamps with last-write-wins conflict resolution at the row level. Each peer maintains a write-ahead log (WAL) and exchanges deltas since the peer’s last-seen HLC; clock drift between peers is tolerated up to a configurable skew window. There is no CRDT merge — earlier docs incorrectly described the model as CRDT-based; the implementation is HLC + LWW + WAL replication, which is simpler and matches the data-loss profile the user signs up for (“the latest write wins”).

# Check sync status (HLC lag, peer reachability, WAL backlog)
feral sync status

# Force a sync round-trip with all peers now
feral sync now

# List configured peers + their HLC lag
feral sync peers

# Show this node's HLC node-id
feral sync node-id

API Reference

Endpoint	Method	Description
`/api/memory/search`	POST	Hybrid search across all tiers
`/api/memory/remember`	POST	Store a fact in the knowledge graph
`/api/memory/episodes`	GET	List recent episodes
`/api/memory/wiki`	GET	List wiki pages
`/api/memory/wiki/{topic}`	GET	Read a wiki page
`/api/memory/stats`	GET	Memory size, tier counts, index health
`/sync`	WebSocket	P2P memory sync between nodes

​Memory System

​Four Memory Tiers

​Working Memory

​Episodic Memory

​Semantic Memory / Knowledge Graph

​Execution Log

​Hybrid Search

​MMR Diversity Reranking

​Session Compaction

​Wiki Compilation

​P2P Sync

​API Reference