Parallel-turn: don't kill a turn just because the user typed

המלכודת "הודעה חדשה = ביטול"The "new message = abort" trap

בגרסה הראשונה, כשמשתמש שלח הודעה חדשה ב-turn שעוד רץ, ביטלנו את ה-turn הקודם. זה נראה הגיוני: "המשתמש שינה דעתו, מה הטעם להמשיך". מימשנו עם AbortController פר-turn, וכל request חדש קרא ל-controller.abort() על הקודם.

// הגרסה השגויה
async function handleMessage(sessionId, message) {
  const session = await loadSession(sessionId);
  if (session.activeController) {
    session.activeController.abort();   // "superseded"
  }
  session.activeController = new AbortController();
  return runAgentTurn(session, message, {
    signal: session.activeController.signal
  });
}

השנייה הייתה במשתמשים שאומרים "כן, יופי, תמשיך" באמצע build. הם פשוט אישרו. הם לא ביקשו cancel. אבל ה-runtime ראה הודעה חדשה ובאמצע write_file שלישי קטע ב-AbortError, נשאר באוויר deploy חצי-בוצע, ומחק את ה-state. המשתמש קיבל "בנייה נכשלה" על שום דבר.

בנינו תרשים זרימה לכל הודעה בתקופת dogfooding פנימית, סימנו אילו מהן ביטלו turn פעיל, וספרנו: 54% מההודעות שביטלו turn היו ביטויים תומכים כמו "כן", "מצוין", "תמשיך". רק 8% היו cancel ברור. השאר היו שאלות הבהרה תוך כדי הבנייה. ה-superseded הזה היה בעיקר נזק עצמי.

In the first version, when a user sent a new message while a turn was still running, we aborted the previous turn. It looked reasonable: "the user changed their mind, why continue". We implemented it with a per-turn AbortController, and every new request called controller.abort() on the previous one.

// the wrong version
async function handleMessage(sessionId, message) {
  const session = await loadSession(sessionId);
  if (session.activeController) {
    session.activeController.abort();   // "superseded"
  }
  session.activeController = new AbortController();
  return runAgentTurn(session, message, {
    signal: session.activeController.signal
  });
}

The second-order effect was users who said "yes, great, keep going" mid-build. They were just affirming. They weren't asking to cancel. But the runtime saw a new message, killed the third write_file mid-call with AbortError, left a half-applied deploy in the air, and discarded the state. The user got "build failed" for nothing.

We charted every message during an internal dogfooding window, tagged which ones aborted a live turn, and counted: 54% of aborts were affirmations like "yes", "great", "keep going". Only 8% were a clear cancel. The rest were clarification questions during the build. Our superseded was mostly self-inflicted damage.

איך מודדים את העלות בלי להאשים את המשתמשMeasuring the cost without blaming the user

ההגנה הטבעית של מהנדסים היא "זה לא הבאג שלנו, המשתמש שלח באמצע". זה נכון פורמלית ומזיק תפעולית. בנינו שלוש מטריקות שגרמו לנו לשנות דעה:

build_completion_rate: כמה אחוזים מ-builds שהתחילו הסתיימו עם deploy מוצלח. מ-87% לפני, ל-44% אחרי הוספת ה-superseded. זה לא היה זמני — זה היה דפוס.
aborted_writes: כמה write_file נקטעו ב-AbortError. עלה מ-2% ל-19%. כל פעם שזה קורה, יש סיכוי שהקובץ נכתב חלקית או שה-state ב-Redis עומד באמצע.
user_redo_rate: כמה משתמשים שלחו את אותה כוונה (לפי dedup) פעמיים בתוך 5 דקות. עלה מ-3% ל-14%. אנשים חזרו על עצמם כי הראשון נקטע.

שלוש המספרים האלה ביחד אמרו דבר אחד: ה-superseded הוא לא optimization, זה bottleneck. בדקנו אותו על dogfooding פנימי ולא על תרחיש מבחן — תמיד חשוב. בתרחיש המבחן, המשתמש מבטל. בשימוש אמיתי, הוא מאשר.

The natural engineer reflex is "not our bug, the user sent something mid-flight". Formally true, operationally damaging. We built three metrics that changed our minds:

build_completion_rate: percent of builds that finished with a successful deploy. 87% before, 44% after the superseded behavior shipped. Not transient — a pattern.
aborted_writes: how many write_file calls died with AbortError. Climbed from 2% to 19%. Every one of those is a chance for a partially-written file or a stranded Redis state.
user_redo_rate: how often users sent the same intent (by dedup hash) twice within five minutes. From 3% to 14%. People repeated themselves because we cut them off.

Together, those three numbers said one thing: superseded was not an optimization, it was a bottleneck. We measured it on internal dogfooding, not on a test scenario — that always matters. In a test scenario the user cancels. In real use they affirm.

parallel-turn policy: שני turns חיים זה ליד זהParallel-turn policy: two turns alive at once

ה-policy החדש: turn חדש לא נוגע ב-turn קודם. שניהם רצים. כל אחד עם turn_id משלו. ההודעות מהמשתמש עוברות לפי סדר ל-DB (table של messages עם created_at), והמודל ב-turn ה-2 רואה את כל ההיסטוריה כולל את ההודעה שהגיעה תוך כדי turn ה-1.

async function handleMessage(sessionId, message) {
  const session = await loadSession(sessionId);
  await appendMessage(sessionId, { role: 'user', content: message });
  // לא קוטעים שום turn קיים.
  const turnId = await startTurn(sessionId, { parent: session.activeTurnId });
  return runAgentTurn(session, { turnId });
}

// /cancel הוא endpoint נפרד
app.post('/cancel', async (req, res) => {
  await markCancel(req.body.sessionId, req.body.turnId);
  return res.json({ ok: true });
});

שני turns שרצים במקביל לאותו project יוצרים סיכון לקונפליקט: שניהם רוצים לכתוב index.html. אנחנו פותרים את זה עם file lock פר-קובץ: לפני write_file הכלי מנסה לקבל lock; אם הוא תפוס, הוא מחכה עד 5 שניות, ואחר כך מקבל שגיאה ברורה. ה-turn השני בדרך כלל מסיים מהר את הכתיבה הראשונה ומשחרר.

חשוב: ההיסטוריה של ההודעות תמיד נכתבת מיד. גם אם turn ה-1 עוד רץ, הודעת המשתמש כבר ב-DB. כך turn ה-2 לא מאבד אותה גם אם turn ה-1 פתאום מסתיים מאוחר.

The new policy: a new turn doesn't touch the running one. Both run. Each carries its own turn_id. User messages flow into the DB in order (a messages table with created_at), and the model in turn 2 sees the full history including the message that arrived during turn 1.

async function handleMessage(sessionId, message) {
  const session = await loadSession(sessionId);
  await appendMessage(sessionId, { role: 'user', content: message });
  // We don't cancel any running turn.
  const turnId = await startTurn(sessionId, { parent: session.activeTurnId });
  return runAgentTurn(session, { turnId });
}

// /cancel is a distinct endpoint
app.post('/cancel', async (req, res) => {
  await markCancel(req.body.sessionId, req.body.turnId);
  return res.json({ ok: true });
});

Two turns running in parallel against the same project create a write risk: both want to touch index.html. We solve it with a per-file lock: before write_file the tool tries to acquire the lock; if held, it waits up to 5 seconds, then errors clearly. Turn 2 usually waits a moment, turn 1 releases, and we move on.

One thing matters: message history is always persisted immediately. Even while turn 1 is still running, the user's new message is already in the DB. That way turn 2 never loses it, even if turn 1 finishes late.

ביטול מפורש: כפתור, לא heuristicExplicit cancel: a button, not a heuristic

אם הודעה חדשה לא מבטלת, איך כן מבטלים? עם פעולה נפרדת. ב-UI יש כפתור "עצור" שמופיע רק כש-session.activeTurnId קיים. הוא קורא ל-POST /cancel עם ה-turn_id המפורש. הוא לא משוייך לאף message.

בצד השרת, ה-cancel הוא flag ב-Redis עם TTL. ה-runtime בודק אותו לפני כל tool call. אם הוא רואה את ה-flag — סוגר את ה-turn בנקייה, רושם aborted_by_user, ומשחרר locks. הוא לא קוטע tool שכבר רץ באמצעו — מחכה שהוא יסיים.

async function shouldCancel(turnId) {
  return await redis.get(`cancel:${turnId}`) === '1';
}

// בתוך לולאת ה-agent
for (const tool of plan) {
  if (await shouldCancel(turnId)) {
    await closeTurn(turnId, 'aborted_by_user');
    return;
  }
  await runTool(tool);
}

שני יתרונות: ראשית, המשתמש יודע שהוא ביטל — הוא לחץ על כפתור. אין שום ספק. שנית, אנחנו לא קוטעים tool באמצעו, מה שמונע את שורת ה-AbortError שהשמידה לנו אחוזי הצלחה.

What workedאחרי המעבר, build_completion_rate חזר ל-89%. user_redo_rate ירד ל-2%. כפתור "עצור" נלחץ ב-1.4% מ-turns. זה הבהיר לנו שה-cancel האמיתי תמיד היה נדיר.

If a new message no longer cancels, how do you cancel? With a separate action. The UI has a "Stop" button visible only when session.activeTurnId exists. It calls POST /cancel with the explicit turn_id. It is not bound to any message.

Server-side, cancel is a flag in Redis with a TTL. The runtime checks it before every tool call. If the flag is set, it closes the turn cleanly, logs aborted_by_user, and releases locks. It does not abort an already-running tool mid-call — it waits for it to finish.

async function shouldCancel(turnId) {
  return await redis.get(`cancel:${turnId}`) === '1';
}

// inside the agent loop
for (const tool of plan) {
  if (await shouldCancel(turnId)) {
    await closeTurn(turnId, 'aborted_by_user');
    return;
  }
  await runTool(tool);
}

Two benefits: first, the user knows they cancelled — they pressed a button. No ambiguity. Second, we never tear down a tool mid-execution, which kills the long tail of AbortError damage.

What workedAfter the switch, build_completion_rate climbed back to 89%. user_redo_rate dropped to 2%. The "Stop" button is pressed on 1.4% of turns. That made it concrete: real cancels were always rare.

איך ה-UI מראה שני turns בלי לבלבלSurfacing two turns in the UI without confusing the user

turns מקבילים יוצרים אתגר UI. אם המשתמש שלח "תוסיף עמוד צור-קשר" באמצע build, ועכשיו יש לו שני turns, הוא צריך לראות מה קורה לכל אחד. אצלנו:

כל turn מקבל "קלף" (card) משלו ב-feed. הקלף הראשון הוא של ה-turn הראשון, השני אחריו.
בקלף יש סטטוס: thinking, using tool: write_file, done. עדכון real-time דרך ה-SSE שלנו.
אם turn 1 ו-turn 2 שניהם עדיין פעילים — שני קלפים פעילים. כפתור Stop על כל אחד בנפרד.
הודעות "מערכת" שאינן turn (למשל confirmation, error) הן קלף שלישי. לא להחליף ולא לערבב.

<div class="turn-card" data-turn-id="trn_8821">
  <header>
    <span class="phase-badge">building</span>
    <button class="cancel-btn" data-turn-id="trn_8821">Stop</button>
  </header>
  <div class="steps">…</div>
</div>

הכלל: turn אחד = קלף אחד. אם יש לך משהו במסך שלא יושב בתוך קלף, הוא לא turn. זה מקל על המשתמש להבין שהוא יכול לשלוח הודעה גם כשמשהו רץ — וזה מקל עלינו ב-debugging.

Parallel turns create a UI challenge. If the user sent "add a contact page" mid-build, they now have two turns, and they need to see what each is doing. Ours:

Each turn gets its own card in the feed. The first card belongs to turn 1, the next to turn 2, in order.
Cards show status: thinking, using tool: write_file, done. Updated in real time over SSE.
If turn 1 and turn 2 are both live, two cards are active. A separate Stop button on each.
"System" messages that aren't a turn (confirmations, errors) are a third kind of card. Never collapsed, never blended.

<div class="turn-card" data-turn-id="trn_8821">
  <header>
    <span class="phase-badge">building</span>
    <button class="cancel-btn" data-turn-id="trn_8821">Stop</button>
  </header>
  <div class="steps">…</div>
</div>

Rule: one turn equals one card. Anything on screen that isn't inside a card isn't a turn. That helps users understand they can send messages while something is still running — and it helps us during debugging.

מלכודות parallel-turnParallel-turn pitfalls

שלוש סכנות שלא ראינו בהתחלה ושכדאי להיערך אליהן:

turns מתנגשים על write: turn 1 כותב about.html וגם turn 2 רוצה. בלי file lock, אחד דורס את השני בלי הודעה. הוספנו lock עם owner=turn_id, וה-loser מקבל שגיאה ברורה שהוא יכול לחזור עליה.
הודעת turn 2 מתייחסת ל-turn 1: "כן, יופי, תמשיך". כשה-prompt רואה את ההודעה הזו ב-turn 2, הוא יכול "להמשיך" משהו ש-turn 1 כבר עושה. הוספנו לו ב-system prompt: "if a previous turn is still running, do not duplicate work; respond only to what is new".
order חבול ב-history: turn 1 מסיים אחרי turn 2. ההודעות assistant-turn-1 ו-assistant-turn-2 נכתבות ב-DB לפי created_at אמיתי, לא לפי turn_id. הקפדנו לכתוב timestamps של start ו-end נפרדים, כדי לשחזר את הסדר.

Warningאל תרשה יותר מ-2 turns פעילים על אותו session. שניים זה reasonable — turn ארוך + הבהרה. שלושה זה משתמש שלא מבין מה קורה. אצלנו, אם נוצר turn-3 בזמן ש-1 ו-2 חיים, אנחנו מפעילים את הראשון; ה-3 ממתין עד שאחד יסתיים.

ה-tradeoff המרכזי: ה-server מסובך יותר, ה-UX פשוט יותר. אם היית בוחר באחת — בחר את ה-UX.

Three hazards we didn't see initially and that are worth preparing for:

Turns colliding on writes: turn 1 is writing about.html, turn 2 wants the same file. Without a file lock, one trampling the other goes silent. We added a lock keyed by turn_id, and the loser gets a clear error it can retry against.
Turn 2's message refers to turn 1: "yes, great, continue". When that prompt arrives in turn 2, the model may "continue" something turn 1 is already doing. We added to the system prompt: "if a previous turn is still running, do not duplicate work; respond only to what's new".
Out-of-order history: turn 1 finishes after turn 2. Assistant messages from both are persisted by real created_at, not by turn_id. We made sure to log start and end timestamps separately so the order can be reconstructed for replays.

WarningDon't allow more than 2 active turns per session. Two is reasonable — a long turn plus a clarification. Three is a user who has lost the plot. If a third turn arrives while two are live, we throttle it: the third waits until one finishes.

The core tradeoff: the server is more complex, the UX is simpler. If you have to pick, pick the UX.