What turns a model into an agent: the sense → think → act loop

מהי בעצם לולאת הסוכןWhat the agent loop actually is

אנשים מדמיינים את ה-API של Claude כצ'אט. בפועל, ברגע שהוספת tools, מקבלים מכונת מצבים. כל קריאה למודל מחזירה אחד משלושה מצבים מעניינים: המודל סיים (end_turn), המודל רוצה להריץ tool (tool_use), או שהפלט נחתך (max_tokens). הלולאה היא הקוד שמטפל במצב השני — ממשיכה להריץ עד שמגיעים למצב הראשון.

זו לא מורכבות שיוצרת ערך — היא תוצאה של טבע ה-API. הסוכן לא יכול להריץ read_file בעצמו; ה-runtime מריץ ומחזיר את התוצאה כ-tool_result בהודעת user הבאה. ה-history גדל בכל איטרציה: assistant (tool_use) → user (tool_result) → assistant (tool_use נוסף או end_turn).

אם תכתוב את זה ב-pseudo-code:

while (true) {
  const response = await client.messages.create({ model, system, tools, messages });
  messages.push({ role: 'assistant', content: response.content });
  if (response.stop_reason === 'end_turn') break;
  if (response.stop_reason === 'tool_use') {
    const results = await Promise.all(response.content
      .filter(b => b.type === 'tool_use')
      .map(b => runTool(b.name, b.input).then(r => ({ tool_use_id: b.id, ...r }))));
    messages.push({ role: 'user', content: results.map(toToolResult) });
    continue;
  }
  throw new Error(`unexpected stop_reason: ${response.stop_reason}`);
}

זה השלד. כל מה שמעניין בסוכנים הוא מה שמוסיפים סביבו.

People picture the Claude API as chat. The moment you add tools, you have a state machine. Each model call returns one of three interesting states: the model is done (end_turn), the model wants a tool to run (tool_use), or the output was truncated (max_tokens). The loop is the code that handles the second case — it keeps going until it reaches the first.

This complexity is not added value, it is a consequence of how the API works. The agent cannot run read_file itself; the runtime runs it and returns the output as a tool_result in the next user message. The history grows on every iteration: assistant (tool_use) → user (tool_result) → assistant (more tool_use or end_turn).

In pseudo-code:

while (true) {
  const response = await client.messages.create({ model, system, tools, messages });
  messages.push({ role: 'assistant', content: response.content });
  if (response.stop_reason === 'end_turn') break;
  if (response.stop_reason === 'tool_use') {
    const results = await Promise.all(response.content
      .filter(b => b.type === 'tool_use')
      .map(b => runTool(b.name, b.input).then(r => ({ tool_use_id: b.id, ...r }))));
    messages.push({ role: 'user', content: results.map(toToolResult) });
    continue;
  }
  throw new Error(`unexpected stop_reason: ${response.stop_reason}`);
}

That is the skeleton. Everything interesting about agents is what you add around it.

תנאי עצירה — לא רק end_turnStop conditions — not just end_turn

while (true) זה פח אש בלי תנאי עצירה. בפרודקשן ראינו לפחות חמישה מצבים שדורשים יציאה מהלולאה, ורק אחד מהם הוא end_turn טבעי.

end_turn — המודל סיים. הפלט הסופי הוא הטקסט בתוכן.
max iterations — הגבלת tools-per-turn. אצלנו 50 ל-builder, 12 ל-router. אם נחצה, סוגרים עם שגיאה ושומרים checkpoint.
budget exceeded — חרגנו מ-token budget של ה-job. עוצרים גם אם המודל באמצע משהו.
tool error fatal — שגיאה שמסומנת כ-non-recoverable (למשל auth נכשל). אין טעם להמשיך.
user cancel — ה-user שלח /cancel. במפורש לא בגלל "superseded" של הודעה חדשה — את זה ביטלנו אחרי שראינו שהוא חתך builds בחצי.

function shouldStop({ response, iter, budget, ctx }) {
  if (response.stop_reason === 'end_turn') return { stop: true, reason: 'end_turn' };
  if (iter >= ctx.maxIters) return { stop: true, reason: 'max_iters' };
  if (budget.tokensUsed > budget.tokensCap) return { stop: true, reason: 'budget' };
  if (ctx.cancelled) return { stop: true, reason: 'user_cancel' };
  return { stop: false };
}

לכל סיבה שנעצרים — שומרים אותה. בדיווח, ב-DB, ב-trace. בלי זה אי אפשר לדבג למה build מסוים לא הסתיים.

while (true) with no stop condition is a fire trap. In production we have at least five states that require exiting the loop, and only one of them is a natural end_turn.

end_turn — the model is done. The final output is the text in the content blocks.
max iterations — tools-per-turn cap. Ours is 50 for the builder, 12 for the router. Crossing it closes the loop with an error and saves a checkpoint.
budget exceeded — the job's token budget is gone. Stop even mid-thought.
tool error fatal — an error flagged non-recoverable (auth failure, for example). Continuing wastes tokens.
user cancel — the user sent /cancel. Explicitly not the older "superseded by new message" rule — we removed that after it cut builds in half.

function shouldStop({ response, iter, budget, ctx }) {
  if (response.stop_reason === 'end_turn') return { stop: true, reason: 'end_turn' };
  if (iter >= ctx.maxIters) return { stop: true, reason: 'max_iters' };
  if (budget.tokensUsed > budget.tokensCap) return { stop: true, reason: 'budget' };
  if (ctx.cancelled) return { stop: true, reason: 'user_cancel' };
  return { stop: false };
}

Every stop reason is recorded — in the response, in the DB, in the trace. Without that record, debugging why a particular build did not finish is impossible.

מבנה ה-history — מה שמור ובאיזה סדרHistory shape — what is stored and in what order

ה-history הוא לא רק array של הודעות. הוא הקלט לכל קריאת מודל בלולאה, אז הצורה שלו קובעת את העלות, את ה-cache hits, ואת מה שהמודל "רואה".

שלושה כללים שלמדנו בדם:

tool_result חייב להיות צמוד ל-tool_use שלו ב-turn הבא. אם דחית tool_result לאחר כך כי "קודם נשאל את ה-user" — Claude יחזיר 400. ה-API לא סלחן.
אל תחבר tool_results מ-turns שונים לאותו user message. כל סבב tool_use → tool_result זוכה לזוג ההודעות שלו. ערבוב מבלבל את ה-cache prefix.
שמור גרסת raw של ה-content blocks, לא רק טקסט. אם תאחסן רק response.content[0].text תאבד את ה-tool_use blocks ולא תוכל לשחזר.

// אחרי קריאת מודל
messages.push({ role: 'assistant', content: response.content });

// אחרי הרצת tools
messages.push({
  role: 'user',
  content: toolUseBlocks.map(block => ({
    type: 'tool_result',
    tool_use_id: block.id,
    content: serializeResult(results[block.id]),
    is_error: results[block.id].error === true,
  })),
});

Noteאנחנו שומרים גם raw_response ב-DB ליד ה-message. זה מכפיל אחסון אבל מציל debug לכל באג עתידי במודל עצמו.

The history is not just a list of messages. It is the input to every model call inside the loop, so its shape determines cost, cache hits, and what the model actually "sees".

Three rules we paid for:

tool_result must be adjacent to its tool_use in the next turn. Defer the tool_result because "we want to ask the user first" and Claude will return 400. The API is unforgiving here.
Do not merge tool_results from different turns into one user message. Each tool_use → tool_result round gets its own message pair. Merging breaks the cache prefix.
Store raw content blocks, not just text. If you persist response.content[0].text you lose tool_use blocks and the loop is no longer replayable.

// after model call
messages.push({ role: 'assistant', content: response.content });

// after tools run
messages.push({
  role: 'user',
  content: toolUseBlocks.map(block => ({
    type: 'tool_result',
    tool_use_id: block.id,
    content: serializeResult(results[block.id]),
    is_error: results[block.id].error === true,
  })),
});

NoteWe also persist raw_response in the DB next to the message. It doubles storage but saves us when debugging any future model-side oddity.

tool calls מקביליים בתוך turnParallel tool calls inside one turn

Claude יכול להחזיר כמה tool_use blocks בתשובה אחת. אם תרוץ אותם סדרתית, אתה משלם את כל ה-latency שלהם בטור. סוכן ה-builder אצלנו מוציא תדיר 4-6 קריאות מקבילות ל-read_file בתחילת build, וההפרש בין סדרתי ל-מקבילי הוא 8-12 שניות.

async function executeToolBlocks(blocks, ctx) {
  return Promise.all(blocks.map(async block => {
    try {
      const result = await runTool(block.name, block.input, ctx);
      return { tool_use_id: block.id, type: 'tool_result', content: JSON.stringify(result) };
    } catch (err) {
      return {
        tool_use_id: block.id,
        type: 'tool_result',
        content: `error: ${err.message}`,
        is_error: true,
      };
    }
  }));
}

שתי דקויות שלא רואים ב-Hello World:

אם שני tools משנים אותו משאב (write_file לאותו path) צריך serialization בצד שלך. ה-API מקביל בעיניים עצומות.
אם אחד מה-tools זרק exception — אסור לזרוק החוצה. חייבים להחזיר tool_result עם is_error: true לכל tool_use_id, אחרת ה-call הבא יקבל 400.

Claude can return several tool_use blocks in one response. Run them serially and you pay all their latency in series. Our builder routinely emits 4-6 parallel read_file calls at the start of a build; the gap between serial and parallel is 8-12 seconds.

async function executeToolBlocks(blocks, ctx) {
  return Promise.all(blocks.map(async block => {
    try {
      const result = await runTool(block.name, block.input, ctx);
      return { tool_use_id: block.id, type: 'tool_result', content: JSON.stringify(result) };
    } catch (err) {
      return {
        tool_use_id: block.id,
        type: 'tool_result',
        content: `error: ${err.message}`,
        is_error: true,
      };
    }
  }));
}

Two subtleties that do not show up in a hello world:

If two tools mutate the same resource (write_file to the same path) you need serialization on your side. The API parallelizes blindly.
If one tool throws — you must not let it escape. You must return a tool_result with is_error: true for every tool_use_id, or the next call returns 400.

מדיניות שגיאות שלא מבלבלת את המודלError policy that does not confuse the model

שגיאת tool היא חלק מהשיחה — היא נכנסת ל-tool_result. הניסוח שלה משפיע ישירות על מה שהמודל יעשה בסיבוב הבא. שגיאה גרועה ("undefined is not a function") שולחת אותו ללולאה של ניחושים. שגיאה טובה אומרת לו מה לעשות אחרת.

טמפלייט שלנו לכל שגיאת tool:

{
  "error": true,
  "code": "file_not_found",
  "message": "path 'src/components/Hero.jsx' does not exist",
  "hint": "list the directory with list_dir before reading",
  "recoverable": true
}

שלושה דברים שכלל באג tool טוב חייב לכלול:

code — קצר, machine-readable, יציב על פני גרסאות.
hint — מה לקרוא או לעשות אחרת. בלי זה המודל מנחש.
recoverable — אם false, ה-runtime יעצור את הלולאה. אם true, המודל יקבל הזדמנות לתקן.

Warningאל תשרשר את ה-stack trace בתוך message. הוא צורך 200-2000 tokens, מבלבל את המודל, ולא עוזר. שמור אותו ב-trace, לא ב-history.

A tool error is part of the conversation — it lands in the tool_result. Its phrasing directly steers what the model does next. A bad error ("undefined is not a function") sends the agent into a guessing loop. A good error tells it what to try instead.

Our template for every tool error:

{
  "error": true,
  "code": "file_not_found",
  "message": "path 'src/components/Hero.jsx' does not exist",
  "hint": "list the directory with list_dir before reading",
  "recoverable": true
}

Three things every good tool error must contain:

code — short, machine-readable, stable across versions.
hint — what to call or do instead. Without this, the model guesses.
recoverable — if false, the runtime exits the loop. If true, the model gets a chance to fix.

WarningDo not stuff stack traces into message. They burn 200-2000 tokens, distract the model, and rarely help. Keep them in the trace, not in the history.

תצפית בלולאה — בלי trace אין debugLoop observability — without traces, no debugging

סוכן בלי trace הוא קופסה שחורה. בפרודקשן ראינו שאי אפשר להבין למה build נתקע אחרי 32 איטרציות בלי לראות את כל הטור. אנחנו רושמים שורת trace לכל איטרציה:

CREATE TABLE agent_loop_trace (
  id            BIGSERIAL PRIMARY KEY,
  job_id        UUID NOT NULL,
  iter          INT NOT NULL,
  stop_reason   TEXT,
  tool_calls    JSONB NOT NULL,    -- [{name, input_hash, ms, ok}]
  input_tokens  INT,
  output_tokens INT,
  cache_read    INT,
  cache_write   INT,
  ts            TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX ON agent_loop_trace (job_id, iter);

חשוב: input_hash, לא ה-input המלא. הוא מספיק לזהות חזרות וחוסך אחסון. את ה-input המלא יש ב-history.

על ה-trace הזה רצות שתי שאילתות שאנחנו מסתכלים עליהן יום-יום:

אחוז ה-jobs שעצרו ב-max_iters — סימן לבעיה ב-stop conditions או ב-prompt.
ההיסטוגרמה של iterations עד end_turn — אם הזנב מתפזר ימינה זה אומר שכללי הברזל לא מספיק חזקים.

An agent without a trace is a black box. In production we cannot understand why a build wedged at iteration 32 without seeing the whole column. We log a trace row per iteration:

CREATE TABLE agent_loop_trace (
  id            BIGSERIAL PRIMARY KEY,
  job_id        UUID NOT NULL,
  iter          INT NOT NULL,
  stop_reason   TEXT,
  tool_calls    JSONB NOT NULL,    -- [{name, input_hash, ms, ok}]
  input_tokens  INT,
  output_tokens INT,
  cache_read    INT,
  cache_write   INT,
  ts            TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX ON agent_loop_trace (job_id, iter);

Note: input_hash, not full input. It is enough to detect repeats and saves storage. Full inputs already live in the history.

Two queries on this trace run constantly in our dashboards:

Percentage of jobs that stopped on max_iters — signals a stop-condition or prompt problem.
Histogram of iterations to end_turn — when the right tail spreads, the iron rules are not strong enough.

מלכודות נפוצות בלולאהCommon loop pitfalls

ארבע טעויות שראינו אצלנו ואצל אחרים, מסודרות לפי שכיחות:

אין cap על iterations. סוכן ננעל בלולאה ושורף $40 בדקה. cap קשה (50) הוא חובת בסיס.
שגיאה זורקת מהלולאה במקום להחזיר tool_result. ה-job נכשל אבל בלי שהמודל ראה את השגיאה. בפעם הבאה הוא יחזור על אותה טעות. תמיד להחזיר tool_result עם is_error: true.
לא לעשות סיריאליזציה ל-writes מקבילים. שני tools כותבים לאותו path, אחד דורס את השני, ה-build נשבר חצי שעה אחר כך. זה bug שאי אפשר לגלות בלי trace טוב.
איבוד ה-tool_use_id בין tool_use ל-tool_result. אם תיצור id חדש או תפספס אחד, הקריאה הבאה תיכשל ב-400. השתמש תמיד ב-id שה-API החזיר, אל תיצור משלך.

Winאצלנו מאז שהוספנו iter ו-stop_reason ל-trace, זמן ה-debug לסוכן תקוע ירד מ-25 דקות לפחות מ-3.

הלולאה היא הגרעין של כל סוכן. רוב הזמן היא תרוץ סבב או שניים ותסתיים. אבל היום שבו היא תרוץ 50 סבבים על job אחד, אתה רוצה לדעת בדיוק למה.

Four mistakes we have seen ourselves and elsewhere, ordered by frequency:

No cap on iterations. An agent locks into a loop and burns $40 a minute. A hard cap (50) is table stakes.
An exception escapes instead of becoming a tool_result. The job fails, but the model never saw the error and will repeat it next time. Always return a tool_result with is_error: true.
No serialization on parallel writes. Two tools write to the same path, one stomps the other, the build breaks half an hour later. Undebuggable without a real trace.
Losing the tool_use_id between tool_use and tool_result. Mint a new id or skip one and the next API call returns 400. Always echo the id the API gave you, never invent your own.

WinOnce we added iter and stop_reason to the trace, mean time to debug a stuck agent dropped from 25 minutes to under 3.

The loop is the core of every agent. Most of the time it runs a turn or two and finishes. But the day it runs 50 turns on a single job, you want to know exactly why.