The Tool Use protocol: the line between "code" and a real agent

שלושת הבלוקים: text, tool_use, tool_resultThe three blocks: text, tool_use, tool_result

הודעה ב-Messages API היא מערך של בלוקים. בלוקים שהמודל יכול לייצר: text (סטרינג רגיל) ו-tool_use (קריאה לכלי, עם id, name, ו-input). בלוק שהמשתמש שולח חזרה: tool_result (עם tool_use_id שמתאים ל-id של ה-tool_use, ו-content שמכיל את התוצאה).

// Assistant turn that calls a tool
{
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Creating the project now." },
    { "type": "tool_use",
      "id": "toolu_01ABC",
      "name": "create_project",
      "input": { "slug": "my-shop", "display_name": "My Shop" } }
  ]
}

// Your next user turn must answer that tool_use
{
  "role": "user",
  "content": [
    { "type": "tool_result",
      "tool_use_id": "toolu_01ABC",
      "content": "{\"ok\":true,\"project_id\":\"prj_8a7c\"}" }
  ]
}

חוק חשוב: אם ה-assistant יצר tool_use, ה-user שאחריו חייב להכיל tool_result תואם. אחרת ה-API יחזיר 400. ראינו את זה הרבה — מפתחים שולחים הודעת user חדשה מהמשתמש האנושי בלי לסגור tool_use קודם, וה-loop קורס.

A Messages API message is an array of content blocks. Blocks the model emits: text (plain string) and tool_use (a tool call, with id, name, and input). The block you send back: tool_result (with tool_use_id matching the tool_use id, and content for the result).

// Assistant turn that calls a tool
{
  "role": "assistant",
  "content": [
    { "type": "text", "text": "Creating the project now." },
    { "type": "tool_use",
      "id": "toolu_01ABC",
      "name": "create_project",
      "input": { "slug": "my-shop", "display_name": "My Shop" } }
  ]
}

// Your next user turn must answer that tool_use
{
  "role": "user",
  "content": [
    { "type": "tool_result",
      "tool_use_id": "toolu_01ABC",
      "content": "{\"ok\":true,\"project_id\":\"prj_8a7c\"}" }
  ]
}

Critical rule: if the assistant emitted a tool_use, the next user turn must contain a matching tool_result. Otherwise the API returns 400. We have seen this often — devs send a fresh user message from the human without closing the prior tool_use, and the loop crashes.

ארבעה stop reasons, ארבעה טיפוליםFour stop reasons, four handlers

בכל תגובה של ה-API יש stop_reason. רוב המפתחים מתעלמים ממנו ופשוט מאחדים את ה-text. זו טעות. כל value דורש התנהגות אחרת:

end_turn — המודל סיים מבחינתו. הצג את ה-text למשתמש, חכה לקלט הבא.
tool_use — המודל קרא לכלי וצריך תוצאה. הרץ את הכלי, החזר tool_result, חזור ל-API.
max_tokens — הפלט נחתך. אל תציג למשתמש כמו שהוא; או תבקש מהמודל להמשיך, או תרים flag לאיכות.
stop_sequence — הגעת ל-stop string שהגדרת. בדרך כלל זה אומר "עבור ל-handler ספציפי".

async function loopOnce(messages, tools) {
  const r = await anthropic.messages.create({ model, messages, tools });
  switch (r.stop_reason) {
    case 'tool_use': {
      const calls = r.content.filter(b => b.type === 'tool_use');
      const results = await Promise.all(calls.map(runTool));
      return { kind: 'continue', messages: [...messages,
        { role: 'assistant', content: r.content },
        { role: 'user', content: results }
      ]};
    }
    case 'end_turn':
      return { kind: 'done', text: extractText(r) };
    case 'max_tokens':
      return { kind: 'truncated', partial: r.content };
    case 'stop_sequence':
      return { kind: 'stopped', sequence: r.stop_sequence };
  }
}

אם אין לכם branch ל-max_tokens, יום אחד תקבלו פלט חתוך וסוכן יציג חצי תשובה למשתמש. ראינו את זה בפרודקשן יותר מפעם אחת.

Every API response carries a stop_reason. Most developers ignore it and just concatenate text. That is a mistake. Each value demands different behavior:

end_turn — the model is done. Show the text, wait for next input.
tool_use — the model called a tool and is waiting for a result. Run the tool, return tool_result, continue.
max_tokens — output truncated. Do not show to user as-is; either ask the model to continue or flag a quality issue.
stop_sequence — you hit a stop string you defined. Usually routes to a specific handler.

async function loopOnce(messages, tools) {
  const r = await anthropic.messages.create({ model, messages, tools });
  switch (r.stop_reason) {
    case 'tool_use': {
      const calls = r.content.filter(b => b.type === 'tool_use');
      const results = await Promise.all(calls.map(runTool));
      return { kind: 'continue', messages: [...messages,
        { role: 'assistant', content: r.content },
        { role: 'user', content: results }
      ]};
    }
    case 'end_turn':
      return { kind: 'done', text: extractText(r) };
    case 'max_tokens':
      return { kind: 'truncated', partial: r.content };
    case 'stop_sequence':
      return { kind: 'stopped', sequence: r.stop_sequence };
  }
}

Without a max_tokens branch, eventually you ship a truncated answer to a user. We have seen this in production more than once.

tool_use מקביליםParallel tool_use

בלוק tool_use אחד הוא לא נפוץ בלבד. ה-API מאפשר למודל לפלוט כמה tool_use ב-content array של turn יחיד — "קרא ל-X וגם ל-Y במקביל". אם הקוד שלכם מטפל רק ב-tool_use הראשון, אתם מאבדים מקצרי-זמן רציניים.

הטיפול הנכון: הריצו את כל ה-tool_uses במקביל, ואז שלחו את כולם כ-tool_result במערך אחד בהודעת ה-user הבאה.

async function runAllToolCalls(content) {
  const calls = content.filter(b => b.type === 'tool_use');
  // run in parallel; each handler validates its own input
  const results = await Promise.all(
    calls.map(async (call) => {
      try {
        const out = await runTool(call.name, call.input);
        return { type: 'tool_result', tool_use_id: call.id,
                 content: JSON.stringify(out) };
      } catch (e) {
        return { type: 'tool_result', tool_use_id: call.id,
                 content: JSON.stringify({ ok: false, error: e.message }),
                 is_error: true };
      }
    })
  );
  return results; // pass as user message content array
}

שני דברים שכדאי לדעת. ראשית, אם אחד הכלים נכשל, אל תפסיקו את כולם — החזירו את השגיאה הספציפית עם is_error: true ותנו לכלים האחרים להחזיר נורמלי. המודל מסוגל לטפל בתערובת. שנית, סדר המעטפה tool_result חייב להתאים לקריאות, אבל לא חייב להיות זהה לסדר ה-tool_use; ההתאמה היא לפי tool_use_id.

A single tool_use per turn is not the only shape. The API lets the model emit several tool_use blocks in one turn's content array — "call X and Y in parallel". If your code handles only the first one, you leave large latency wins on the table.

Correct handling: run every tool_use in parallel, then send all of them back as tool_results in a single user-message array.

async function runAllToolCalls(content) {
  const calls = content.filter(b => b.type === 'tool_use');
  // run in parallel; each handler validates its own input
  const results = await Promise.all(
    calls.map(async (call) => {
      try {
        const out = await runTool(call.name, call.input);
        return { type: 'tool_result', tool_use_id: call.id,
                 content: JSON.stringify(out) };
      } catch (e) {
        return { type: 'tool_result', tool_use_id: call.id,
                 content: JSON.stringify({ ok: false, error: e.message }),
                 is_error: true };
      }
    })
  );
  return results; // pass as user message content array
}

Two things to note. First, if one tool fails, don't abort the rest — return the specific error with is_error: true and let the others succeed. The model can handle a mixed batch. Second, the tool_result order does not have to mirror the tool_use order; pairing is by tool_use_id.

כשהמודל פולט JSON-as-text במקום tool_useWhen the model emits JSON-as-text instead of tool_use

זה קורה. ראינו את זה ב-Hive עם Opus 4.6 דרך OAuth כשמספר הכלים עבר 12, וגם בכמה תרחישים אחרים: prompt ארוך מאוד, שילוב tools + thinking + system עמוס. במקום בלוק tool_use מובנה, המודל יוצר בלוק text שמכיל JSON או XML שמתאר את הקריאה. הסיבה: כשהמודל מתבלבל בפרוטוקול, הוא חוזר להרגלי הטקסט החופשי שלו.

אל תכריזו על failure. כתבו patch שיודע לזהות JSON-as-text ולהמיר אותו בחזרה ל-tool_use סינתטי לפני שתחזירו את ה-content ל-loop:

function recoverToolUseFromText(content) {
  return content.flatMap((b) => {
    if (b.type !== 'text') return [b];
    const m = b.text.match(/<tool_use>\s*(\{[\s\S]+?\})\s*<\/tool_use>/);
    if (!m) return [b];
    try {
      const obj = JSON.parse(m[1]);
      if (obj.name && obj.input) {
        return [{ type: 'tool_use', id: `synth_${Date.now()}`,
                  name: obj.name, input: obj.input }];
      }
    } catch {}
    return [b];
  });
}

ב-Hive זה עוזר ל-~3% מה-turns שאחרת היו נשברים. השאר את ה-text המקורי בלוגים — זה כלי דיאגנוסטי טוב לזהות מתי המודל מתחיל להחליק (אם אחוז ה-recovery עולה, סימן שצריך לקצץ tools או prompt).

It happens. We saw it in Hive with Opus 4.6 over OAuth once the tool count passed 12, and in a few other cases: extremely long prompts, tools + thinking + heavy system mixed. Instead of a structured tool_use block, the model emits a text block containing JSON or XML describing the call. Cause: when confused about protocol, the model falls back to free-form text habits.

Don't fail. Write a patch that detects JSON-as-text and converts it back to a synthetic tool_use before handing content to the loop:

function recoverToolUseFromText(content) {
  return content.flatMap((b) => {
    if (b.type !== 'text') return [b];
    const m = b.text.match(/<tool_use>\s*(\{[\s\S]+?\})\s*<\/tool_use>/);
    if (!m) return [b];
    try {
      const obj = JSON.parse(m[1]);
      if (obj.name && obj.input) {
        return [{ type: 'tool_use', id: `synth_${Date.now()}`,
                  name: obj.name, input: obj.input }];
      }
    } catch {}
    return [b];
  });
}

In Hive this rescues ~3% of turns that would otherwise break. Keep the original text in logs — it is a good diagnostic for when the model starts slipping (rising recovery rate means trim your tools or prompt).

tool_choice: מתי להכריחtool_choice: when to force

פרמטר tool_choice בקריאה ל-API שולט מה המודל חייב לעשות בקשר לכלים. הערכים: auto (ברירת מחדל — המודל מחליט), any (חייב לקרוא איזשהו כלי), { "type": "tool", "name": "X" } (חייב לקרוא דווקא ל-X), none (אסור לקרוא לאף כלי).

השתמשו ב-any או ב-tool ספציפי כשאתם בטוחים שצעד הוא חובה. למשל אצלנו ה-builder ב-turn הראשון של בקשת build חייב לקרוא ל-todo_write — בלי זה הוא נוטה לקפוץ ישר ל-write_file ולשכוח את התכנון. tool_choice: { type: 'tool', name: 'todo_write' } בקריאה הראשונה פותר את זה לחלוטין.

async function startBuild(input) {
  // Force a plan first
  const r1 = await anthropic.messages.create({
    model, messages: [{ role: 'user', content: input }],
    tools, tool_choice: { type: 'tool', name: 'todo_write' }
  });
  // After the plan, switch back to auto
  return continueLoop(r1, { tool_choice: 'auto' });
}

Noteאל תשתמשו ב-any בכל turn — זה הופך את הסוכן לכלי-מוכן-לשימוש שלא יודע להפסיק. any או tool ספציפי נכונים רק כשהשלב הזה דורש פעולה מובהקת. ברירת המחדל auto צריכה לכסות 95% מה-turns.

השימוש ב-none נדיר אבל שימושי: כשאתם רוצים לבקש מהמודל לסכם או לדבר עם המשתמש בלי שיגלוש לעוד tool calls באותו turn.

The tool_choice parameter controls what the model must do regarding tools. Values: auto (default — model decides), any (must call some tool), { "type": "tool", "name": "X" } (must call exactly X), none (no tool calls allowed).

Use any or a specific tool when a step is genuinely mandatory. In Hive, the builder's first turn on a build request must call todo_write — otherwise it skips planning and jumps to write_file. Setting tool_choice: { type: 'tool', name: 'todo_write' } on the first call resolves that completely.

async function startBuild(input) {
  // Force a plan first
  const r1 = await anthropic.messages.create({
    model, messages: [{ role: 'user', content: input }],
    tools, tool_choice: { type: 'tool', name: 'todo_write' }
  });
  // After the plan, switch back to auto
  return continueLoop(r1, { tool_choice: 'auto' });
}

NoteDon't set any on every turn — it turns the agent into a tool-firing automaton that cannot stop. any or a specific tool are only correct when this step demands action. The default auto should cover 95% of turns.

none is rarer but useful: when you want the model to summarize or talk to the user without sliding back into more tool calls on the same turn.

retry: לפי קוד, לא בעיוורוןRetry: by code, not blindly

פולסי retry של ה-API צריך להבדיל בין סוגי שגיאות, אחרת תיצרו loops יקרים או thundering herds. הכלל הפשוט:

5xx — שגיאת שרת. retry עם exponential backoff + jitter. עד 5 ניסיונות.
429 — rate limit. כבד את header retry-after. אם אין כזה, גש ל-backoff עם base גבוה יותר.
4xx (חוץ מ-429) — בקשה רעה. אין retry. תקן את הבקשה.
network/timeout — retry עד N עם cap. אבל אם הקריאה הייתה כבר ב-flight, היזהרו מ-double execution של tool side-effects.

async function callWithRetry(fn, opts = {}) {
  const max = opts.max ?? 5;
  let attempt = 0;
  while (true) {
    try {
      return await fn();
    } catch (e) {
      const status = e.status;
      if (status && status >= 400 && status < 500 && status !== 429) throw e;
      if (attempt >= max) throw e;
      const wait = status === 429 && e.headers?.['retry-after']
        ? Number(e.headers['retry-after']) * 1000
        : Math.min(30_000, 500 * 2 ** attempt) + Math.random() * 200;
      await new Promise(r => setTimeout(r, wait));
      attempt++;
    }
  }
}

הטעות הנפוצה ביותר: retry על 4xx "כי אולי השרת שכח". זה לא יעזור — הבעיה היא בבקשה. ב-Hive נרשם פעם בנה loop של 200 ניסיונות על 400 invalid_request_error כי הקוד לא הבחין. השרת שלנו שילם, האחוז של ה-API quota נשרף.

API retry policy must distinguish error classes, or you create expensive loops or thundering herds. The simple rules:

5xx — server error. Retry with exponential backoff + jitter, up to 5 attempts.
429 — rate limit. Honor the retry-after header. If absent, fall back to backoff with a larger base.
4xx (other than 429) — bad request. Do not retry. Fix the request.
network/timeout — retry up to N with a cap. But if the call was already in flight, beware of double-execution of tool side-effects.

async function callWithRetry(fn, opts = {}) {
  const max = opts.max ?? 5;
  let attempt = 0;
  while (true) {
    try {
      return await fn();
    } catch (e) {
      const status = e.status;
      if (status && status >= 400 && status < 500 && status !== 429) throw e;
      if (attempt >= max) throw e;
      const wait = status === 429 && e.headers?.['retry-after']
        ? Number(e.headers['retry-after']) * 1000
        : Math.min(30_000, 500 * 2 ** attempt) + Math.random() * 200;
      await new Promise(r => setTimeout(r, wait));
      attempt++;
    }
  }
}

Most common mistake: retrying 4xx "in case the server forgot". It won't help — the problem is in the request. We once watched a service rebuild a 200-attempt loop against a 400 invalid_request_error because the code did not distinguish. Our quota took the hit.