Session-as-State, not heuristics

הקשר אינו stateContext is not state

אחת הטעויות הנפוצות ביותר באגנטים: ערבוב בין הקשר ל-state. הקשר הוא 30 ההודעות האחרונות. state הוא מה שאתה מבטיח שהוא נכון עכשיו. הקשר הוא רך, רועש, יכול להכיל סתירות. state הוא קשיח, חד-משמעי, ויש לו owner.

ניקח דוגמה. משתמש כתב לפני 8 הודעות "בוא נבנה אתר חדש". לפני 3 הודעות כתב "לא, נשאר עם הקיים". עכשיו הוא כותב "תעשה deploy". מה ה-state? אם אתה מסתמך על הקשר, התשובה תלויה בכמה הודעות אתה קורא ובאילו כללים. אם אתה מסתמך על state, התשובה היא session.project_id = "prj_8821", ואין מה לדון.

אצלנו ה-state ב-session היא מינימליסטי בכוונה — מה שלא חייב להיות שם, לא נכנס:

CREATE TABLE sessions (
  session_id      TEXT PRIMARY KEY,
  user_id         TEXT NOT NULL,
  project_id      TEXT,           -- נעול אחרי הקצאה
  phase           TEXT NOT NULL,  -- chatting|building|verifying|done
  pending_tool    JSONB,          -- tool_use פעיל אם יש
  cancel_token    TEXT,
  updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

שש שדות. כל אחד מהם מתועד מי כותב לו ומתי. אין כאן "last_user_intent", אין "likely_action", אין שדות שניחשו משהו מהקשר.

One of the most common mistakes in agents: conflating context with state. Context is the last 30 messages. State is what you promise is true right now. Context is soft, noisy, contradictory. State is rigid, unambiguous, and has an owner.

Take an example. Eight messages ago the user said "let's build a new site". Three messages ago they said "no, stay with the existing one". Now they type "deploy it". What is the state? If you rely on context, the answer depends on how far back you read and what rules you apply. If you rely on state, the answer is session.project_id = "prj_8821", end of conversation.

Our session state is deliberately minimal — anything that doesn't need to be there, isn't:

CREATE TABLE sessions (
  session_id      TEXT PRIMARY KEY,
  user_id         TEXT NOT NULL,
  project_id      TEXT,           -- locked after assignment
  phase           TEXT NOT NULL,  -- chatting|building|verifying|done
  pending_tool    JSONB,          -- active tool_use, if any
  cancel_token    TEXT,
  updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

Six fields. Each one has a documented writer. No "last_user_intent", no "likely_action", no fields that guessed at something from context.

מי כותב, מתי, ולמה רק הואWho writes, when, and why only them

הכלל המרכזי: לכל שדה ב-session יש writer יחיד וברור. אם יש שניים, יש סיכוי טוב לסתירה. אצלנו:

project_id: נכתב פעם אחת בלבד, על ידי build_and_deploy כשמסומן confirm_new_project=true, או על ידי route של open existing. אחרי הכתיבה — נעול. אין tool שיכול לשנות אותו ב-turn.
phase: נכתב על ידי המעבר המתאים. todo_complete("build") מעביר ל-verifying. publish מעביר ל-done.
pending_tool: נכתב על ידי ה-runtime עצמו ברגע שהוא מתחיל לבצע tool_use. נמחק כשמגיע tool_result.
cancel_token: נכתב רק על ידי endpoint של /cancel מפורש מהמשתמש.

async function lockProjectId(sessionId, projectId) {
  const { rowCount } = await db.query(
    `UPDATE sessions
        SET project_id = $1, updated_at = now()
      WHERE session_id = $2 AND project_id IS NULL`,
    [projectId, sessionId]
  );
  if (rowCount !== 1) throw new Error('project_id already locked');
}

שים לב ל-WHERE project_id IS NULL. זה לא רק defensive — זו ההבטחה. אחרי שהשדה כתוב, שום קוד אחר לא יכול לשנות אותו בלי שגיאה. הקפדה הזו היא מה שמונע את כל סוגי הפרויקטי-רפאים.

The central rule: every session field has a single, named writer. If there are two, there will be a contradiction sooner or later. Ours:

project_id: written exactly once, by build_and_deploy with confirm_new_project=true, or by the open existing route. After write — locked. No tool can change it mid-turn.
phase: written by the matching transition. todo_complete("build") moves to verifying. publish moves to done.
pending_tool: written by the runtime itself the moment it begins executing a tool_use. Cleared when the tool_result arrives.
cancel_token: written only by the /cancel endpoint, called explicitly by the user.

async function lockProjectId(sessionId, projectId) {
  const { rowCount } = await db.query(
    `UPDATE sessions
        SET project_id = $1, updated_at = now()
      WHERE session_id = $2 AND project_id IS NULL`,
    [projectId, sessionId]
  );
  if (rowCount !== 1) throw new Error('project_id already locked');
}

Note the WHERE project_id IS NULL. This isn't just defensive — it's the contract. Once the field is written, no other code path can change it without raising. That single rule is what eliminated every flavor of ghost project for us.

state machine מפורש, לא נסתרAn explicit state machine, not a hidden one

אם תסתכל על קוד agent ותחפש את ה-state machine — לרוב הוא לא קיים פיזית. הוא מפוזר על פני if-ים, על פני prompt-ים, על פני התנהגות שיורש מה-history. זה עובד — עד שלא. אצלנו ה-state machine היא טבלה אחת:

const TRANSITIONS = {
  chatting:  { open_existing: 'chatting', start_build: 'building' },
  building:  { todo_done_build: 'verifying', cancel: 'chatting' },
  verifying: { publish: 'done', revert: 'building', cancel: 'chatting' },
  done:      { open_existing: 'chatting', start_build: 'building' }
};

function transition(session, event) {
  const next = TRANSITIONS[session.phase]?.[event];
  if (!next) {
    throw new Error(
      `invalid transition: phase=${session.phase}, event=${event}`
    );
  }
  return next;
}

כל מעבר מתועד. אם יש מעבר חדש, מוסיפים שורה לטבלה. אם יש כלי שמנסה מעבר לא קיים — הוא מקבל שגיאה ברורה, וזה רץ לראש הלוג. ב-dogfooding שלנו תפסנו ככה כמה מעברים שכשלו בשגיאה — כל אחד מהם היה bug אמיתי שתפסנו לפני שהגיע למשתמש.

הטבלה הזו היא חד-משמעית. אם תקרא את ה-prompt למודל ותקרא את הטבלה הזו, תוכל לחזות את ההתנהגות של ה-agent ב-95% מהמקרים. ב-prompt בלבד — לעולם לא.

If you look at agent code and try to find the state machine, it usually isn't there physically. It's scattered across if statements, prompt fragments, and behavior inferred from history. It works — until it doesn't. Ours is a single table:

const TRANSITIONS = {
  chatting:  { open_existing: 'chatting', start_build: 'building' },
  building:  { todo_done_build: 'verifying', cancel: 'chatting' },
  verifying: { publish: 'done', revert: 'building', cancel: 'chatting' },
  done:      { open_existing: 'chatting', start_build: 'building' }
};

function transition(session, event) {
  const next = TRANSITIONS[session.phase]?.[event];
  if (!next) {
    throw new Error(
      `invalid transition: phase=${session.phase}, event=${event}`
    );
  }
  return next;
}

Every transition is documented. New transitions add a row to the table. A tool attempting a non-existent transition raises clearly and lights up logs. In our dogfooding this is how we caught a handful of failed transitions — each one a real bug we caught before it reached a user.

This table is unambiguous. Reading the model prompt plus this table lets you predict 95% of agent behavior. From the prompt alone — never.

concurrency: שני tabs, אותו userConcurrency: two tabs, one user

היום שבו תגלה ש-state אינו state אמיתי הוא היום שבו משתמש פותח שני tabs. tab A באמצע build, tab B שולח "תכבה את הbuild". איך אתה יודע איזה session של מי?

שני כללים שעבדו אצלנו:

session_id חי ב-cookie של ה-tab, לא של המשתמש. כל tab הוא session נפרד. שני tabs של אותו user רואים שני session-ים שונים בכוונה.
project_id יכול להיות משותף בין sessions, אבל פעולות הולכות דרך ה-session. אם tab B רוצה לעצור build שרץ ב-tab A, הוא חייב לקרוא ל-/cancel?session_id=A. הוא יודע מה ה-id של A דרך rendezvous שמתפרסם דרך channel pub-sub שמשותף לכל ה-tabs של ה-user.

// pub/sub פנימי
await redis.publish(
  `user:${userId}:sessions`,
  JSON.stringify({ event: 'session_open', sessionId, projectId })
);

זה נשמע overengineered עד שמשתמש פותח 4 tabs לאתר אחד וחושב למה השני נתקע. ההפרדה הברורה בין session ל-project היא מה שמאפשר לבסס פעולות multi-tab בלי שיתפסו זה את זה.

The day you discover your state isn't real state is the day a user opens two tabs. Tab A is mid-build, tab B sends "kill the build". How do you know whose session belongs to which?

Two rules worked for us:

session_id lives in a tab-scoped cookie, not a user cookie. Each tab is a distinct session. Two tabs from one user see two distinct sessions on purpose.
project_id can be shared across sessions, but actions go through the session. If tab B wants to stop a build running in tab A, it has to call /cancel?session_id=A. It knows A's id via rendezvous, broadcast on a pub/sub channel shared by all of the user's tabs.

// internal pub/sub
await redis.publish(
  `user:${userId}:sessions`,
  JSON.stringify({ event: 'session_open', sessionId, projectId })
);

This feels overengineered until a user opens four tabs on the same project and wonders why two of them stalled. The clear separation between session and project is what makes multi-tab work without tabs trampling each other.

מה לא לשים ב-sessionWhat does not belong in the session

חוסר משמעת לגבי מה שייך ל-session הוא איך session גדל ל-2 KB ופתאום אתה משלם זמן רשת על כל בקשה. הכלל אצלנו: אם זה ניתן לחישוב מחדש מ-DB, זה לא ב-session.

דוגמאות לשדות שדחינו:

messages: ההיסטוריה של ה-chat נשמרת ב-table נפרד עם FTS. ה-session מחזיקה רק last_message_id אם בכלל. כשצריך — אנחנו טוענים מהטבלה.
tool_results_cache: cache של ה-results לא צריך להיות ב-session. הוא ב-Redis עם TTL.
user_preferences: שייך לטבלת users. session מקבלת אותם דרך JOIN במידת הצורך.
todos: הם פר-פרויקט, לא פר-session. project_id ב-session מספיק כדי להגיע אליהם.

Noteאם אתה מוצא את עצמך כותב session.set('something') במספר מקומות בקוד, סביר להניח ש-something הוא לא state אלא cache. הוצא אותו ל-Redis עם TTL ופתור את הבלגן.

session קטן הוא session מהיר. הוא נכתב atomically, נטען עם שאילתה אחת, ולא דורש migrations מורכבים. אם session שלך מעל 1 KB — סימן שמשהו זוחל פנימה שלא צריך להיות שם.

Lack of discipline about what belongs in a session is how it grows to 2 KB and you start paying network time on every request. Our rule: if it can be recomputed from the DB, it does not live in the session.

Examples of fields we rejected:

messages: chat history lives in its own table with FTS. The session keeps at most last_message_id. We load from the table when we need it.
tool_results_cache: a results cache doesn't belong in the session. It belongs in Redis with a TTL.
user_preferences: belongs in the users table. The session pulls them via JOIN if it must.
todos: are per-project, not per-session. The project_id in the session is enough to reach them.

NoteIf you find yourself writing session.set('something') in multiple places, something is probably a cache, not state. Move it to Redis with a TTL and stop polluting your session.

A small session is a fast session. It writes atomically, loads with one query, and doesn't need complex migrations. If yours is over 1 KB, something is creeping in that shouldn't be there.

מצבי כשל וכיצד session-as-state עוזרתFailure modes and how session-as-state helps

שלושה תרחישי כשל שכל agent נתקל בהם, ושאצלנו נפתרו בעקבות שמירה על session קשיח:

turn שמת באמצע: ה-runtime נפל אחרי שכלי החל לרוץ. בלי session, לא ידעת מה היה ב-flight. עם pending_tool נכתב ב-DB לפני הקריאה למודל ונמחק רק אחרי tool_result, אנחנו יודעים בדיוק איפה נעצרנו ויכולים לחדש או לבטל בנקייה.
retry של אותה הודעה: לקוח שולח אותה הודעה פעמיים בגלל timeout. last_message_id ב-session + check ב-table של messages — וה-server מזהה duplicate ומחזיר את אותו response.
multi-tab race: tab A קורא ל-publish, tab B קורא ל-cancel באותו רגע. שניהם עוברים דרך update עם WHERE phase IN (...). רק אחד מצליח. הלקוח של השני מקבל שגיאה ברורה.

async function publishProject(sessionId) {
  const { rowCount } = await db.query(
    `UPDATE sessions SET phase = 'done'
      WHERE session_id = $1 AND phase = 'verifying'`,
    [sessionId]
  );
  if (rowCount !== 1) throw new Error('cannot publish from current phase');
}

הנקודה: כל המגנים האלה אינם "defensive coding" — הם תוצאה של מודל state ברור שמכריח אותך לכתוב את התנאי הנכון בכל update.

Three failure scenarios every agent hits, all addressed by keeping session strict:

Turn dies mid-flight: the runtime crashes after a tool started. Without session you don't know what was in flight. With pending_tool written to the DB before calling the model, and cleared only after tool_result, we know exactly where we stopped and can resume or cancel cleanly.
Same-message retry: client sends the same message twice on a timeout. last_message_id in the session plus a check against the messages table — server spots the duplicate and returns the same response.
Multi-tab race: tab A calls publish, tab B calls cancel at the same instant. Both go through an update with WHERE phase IN (...). Only one wins. The other client gets a clear error.

async function publishProject(sessionId) {
  const { rowCount } = await db.query(
    `UPDATE sessions SET phase = 'done'
      WHERE session_id = $1 AND phase = 'verifying'`,
    [sessionId]
  );
  if (rowCount !== 1) throw new Error('cannot publish from current phase');
}

The point: these aren't "defensive coding" — they're a consequence of an explicit state model that forces you to write the right condition into every update.