Engineering
Building Real-Time Agent Memory
The architecture behind Parley's persistent memory layer — tradeoffs, failures, and what we landed on.

Emily Osei
Parley
Senior Technical Writer

Agent memory is often misunderstood as a storage problem. In reality, it is an attention problem: what should the system choose to remember, and when should it retrieve that information?
We initially experimented with a naive approach — storing everything the user says and retrieving it via semantic search. This quickly broke down. The system became noisy, inconsistent, and overconfident in irrelevant past context.
We rebuilt memory as a real-time decision layer embedded into execution.
The architecture is structured into three layers:
Short-term memory captures the immediate session context — what is happening right now, within a single interaction flow. This is highly volatile and often overwritten.
Mid-term memory stores working facts: temporary but reusable knowledge such as ongoing tasks, partial outputs, and intermediate decisions.
Long-term memory stores stable user preferences, recurring patterns, and durable facts that meaningfully influence future behavior.
The key innovation is not storage — it is retrieval prioritization.
Every memory candidate is scored based on:
relevance to current task
recency decay
confidence level
historical usefulness
contradiction risk with current context
We also learned that memory must be aggressively selective. Systems that “remember everything” tend to degrade because irrelevant context becomes indistinguishable from signal.
In production, the most important metric is not memory size — it is memory precision under pressure. A small, highly relevant memory store consistently outperforms large, noisy ones.


