Iron rules for the agent loop

תקצירTL;DR

כללי הברזל לא נכתבים ב-system prompt בלבד. הם נאכפים על ידי בדיקות אחרי כל קריאת tool: אסור לקרוא לאותו tool עם אותם args פעמיים ברצף, אסור להצהיר "done" בלי verification, אסור להמציא נתיבי קבצים, חובה להצהיר כוונה לפני שינוי משאב משותף, ולשאול את המשתמש כשיש דו-משמעות. ב-Hive זה מוריד את אחוז הסוכנים שננעלים בלולאה מ-9% לפחות מ-1%.

Iron rules are not written into the system prompt alone. They are enforced by post-call checks: never call the same tool with identical args twice in a row, never claim 'done' without verification, never fabricate file paths, declare intent before mutating shared state, ask the user when ambiguous instead of guessing. In Hive these dropped agent loop-lock rate from ~9% to under 1%.

למה ברזל ולא הצעהWhy iron, not suggestion

קל ליפול לפיתוי לכתוב "please don't call the same tool twice in a row" ב-system prompt ולהמשיך הלאה. זה לא עובד. בלוגים שלנו ראינו את אותו סוכן, עם אותו prompt, קורא ל-read_file('routes/client.js') ארבע פעמים ברצף — כי בכל ניסיון הוא חשב "אולי הפעם אצליח לקרוא משהו אחר".

הסוכנים לא מצייתים להוראות אם ההוראה לא מקושרת לתוצאה. ה-system prompt הוא בקשה. אכיפה היא תוצאה. ההבדל: ב-prompt אומרים "לא לעשות X"; באכיפה, אם הסוכן עושה X, ה-call נכשל ומחזיר תיאור שגיאה ספציפי. ה-LLM לומד מהר יותר מ-feedback מאשר מ-instruction.

הכללים מתחלקים לשתיים: liveness (לא לתקוע את ה-loop) ו-safety (לא לעשות נזק). שתיהן נאכפות במקום שאי אפשר לעקוף — ה-runtime שמריץ את ה-tools, לא ה-LLM שמחליט מה לקרוא.

It is tempting to write "please don't call the same tool twice in a row" into the system prompt and move on. It does not work. We have logs of the same agent, with the same prompt, calling read_file('routes/client.js') four times in a row — each retry hoping that this time the file would be different.

Agents do not obey instructions when the instruction is not coupled to a consequence. The system prompt is a request. Enforcement is a consequence. The difference: in a prompt you say "do not do X"; in enforcement, if the agent does X, the call fails and returns a specific error. LLMs learn faster from feedback than from instruction.

The rules split in two: liveness (don't wedge the loop) and safety (don't cause harm). Both must be enforced where bypass is impossible — the runtime that runs tools, not the LLM that picks them.

כלל #1: אותו tool, אותם args, פעמיים ברצף — אסורRule #1: same tool, same args, twice in a row is forbidden

זו הלולאה הקלאסית. הסוכן קורא ל-tool, מקבל תוצאה שלא מוצאת חן בעיניו, וקורא לו שוב עם אותם args — כאילו ה-tool יחזיר משהו אחר. אצלנו זה היה הסיבה ל-9% מה-loops שנתקעו ברצף הראשון.

function detectRepeat(toolName, args, history) {
  const last = history.at(-1);
  if (!last || last.name !== toolName) return false;
  return JSON.stringify(last.args) === JSON.stringify(args);
}

async function executeTool(toolName, args, ctx, history) {
  if (detectRepeat(toolName, args, history)) {
    return toolError('duplicate_call',
      `${toolName} was just called with identical args. ` +
      `Previous result: ${JSON.stringify(history.at(-1).result).slice(0, 200)}. ` +
      `Either change args, call a different tool, or stop.`);
  }
  return runTool(toolName, args, ctx);
}

המסר חשוב לא פחות מהבדיקה. הסוכן מקבל את התוצאה הקודמת בחזרה, וגם את האפשרויות הריאליות. בלי הזכרת התוצאה, הוא לפעמים שוכח שהוא כבר קיבל אותה.

Noteקריאות זהות עם פער של כמה turns ביניהן הן בסדר — אולי המצב השתנה. הכלל הוא רק על קריאה סמוך מיד אחרי קודמתה.

This is the classic loop. The agent calls a tool, dislikes the result, and calls it again with identical args — as if the tool would respond differently. In our incident logs this accounted for ~9% of stuck-loop cases.

function detectRepeat(toolName, args, history) {
  const last = history.at(-1);
  if (!last || last.name !== toolName) return false;
  return JSON.stringify(last.args) === JSON.stringify(args);
}

async function executeTool(toolName, args, ctx, history) {
  if (detectRepeat(toolName, args, history)) {
    return toolError('duplicate_call',
      `${toolName} was just called with identical args. ` +
      `Previous result: ${JSON.stringify(history.at(-1).result).slice(0, 200)}. ` +
      `Either change args, call a different tool, or stop.`);
  }
  return runTool(toolName, args, ctx);
}

The error message matters as much as the check. The agent gets the previous result handed back to it, plus the actual menu of options. Without the previous result, it sometimes forgets it already saw the answer.

NoteIdentical calls separated by a few turns are fine — state may have changed. The rule fires only when the call is immediately after its predecessor.

כלל #2: אסור "done" בלי verificationRule #2: no 'done' without verification

סוכנים אוהבים להגיד "סיימתי". הרבה פעמים הם אומרים את זה אחרי turn אחד שכלל רק todo_write — בלי שום קריאה שמייצרת תוצאה ניתנת לבדיקה. זה הזיה של תפוקה.

הכלל: לפני end_turn שמתאר השלמה, חייב להיות tool call ספציפי שמהווה verification של ה-deliverable. אצלנו ה-checklist הוא:

אם המשימה היא build — חייב להיות critique ירוק על URL.
אם המשימה היא bug fix — חייב להיות run_tests ירוק.
אם המשימה היא write — חייב להיות read_file שמאשר שהתוכן נכתב.
אם המשימה היא ניתוח — חייבת להיות הערכה כמותית בתוכן הסיכום (מספרים, לא רק מילים).

const REQUIRED_BEFORE_DONE = {
  build: ['critique'],
  fix: ['run_tests'],
  write: ['read_file'],
};

function canEmitDone(taskKind, history) {
  const required = REQUIRED_BEFORE_DONE[taskKind] || [];
  return required.every(name => history.some(t => t.name === name && t.result?.ok === true));
}

אם הסוכן מנסה end_turn בלי שעמד בדרישה, אנחנו מזריקים הודעת מערכת: "You have not verified the deliverable. Call <tool> against the produced artifact before declaring done." הוא חוזר עם ה-tool הנכון.

Agents love to say "done". They will say it after a single turn that ran nothing but todo_write — no call that produced a checkable result. That is hallucinated completion.

The rule: before an end_turn framed as completion, there must be a specific tool call that verified the deliverable. Our checklist:

Task is build — must have a green critique on a URL.
Task is bug fix — must have a green run_tests.
Task is write — must have a read_file that confirms the content.
Task is analysis — must contain quantitative findings in the summary (numbers, not just prose).

const REQUIRED_BEFORE_DONE = {
  build: ['critique'],
  fix: ['run_tests'],
  write: ['read_file'],
};

function canEmitDone(taskKind, history) {
  const required = REQUIRED_BEFORE_DONE[taskKind] || [];
  return required.every(name => history.some(t => t.name === name && t.result?.ok === true));
}

If the agent attempts end_turn without satisfying the requirement, we inject a system message: "You have not verified the deliverable. Call <tool> against the produced artifact before declaring done." It comes back with the correct tool.

כלל #3: אסור להמציא נתיבי קבציםRule #3: no fabricated file paths

LLMs ממציאים נתיבים. הם רואים פרויקט Node, חושבים "כנראה יש config/database.js", וקוראים ל-read_file('config/database.js') שמחזיר ENOENT. הם לא מבולבלים מזה - הם פשוט ממציאים את הנתיב הבא.

הכלל: כל קריאה ל-tool שמקבל path חייבת לבוא אחרי list_directory או find_files שכלל את ה-path הזה בתוצאה שלו. אם הסוכן קורא ל-read_file('foo.js') בלי שראינו את הנתיב בתוצאת tool קודם באותו turn, אנחנו דוחים.

function pathWasObserved(filePath, history) {
  return history.some(call => {
    if (!call.result?.ok) return false;
    const blob = JSON.stringify(call.result);
    return blob.includes(filePath);
  });
}

async function readFile(args, ctx, history) {
  if (!pathWasObserved(args.path, history)) {
    return toolError('unverified_path',
      `Path '${args.path}' was not observed in any earlier tool result. ` +
      `Call list_directory or find_files first to confirm it exists.`);
  }
  return fs.promises.readFile(args.path, 'utf8');
}

זה אגרסיבי במכוון. הוא מחזיר false-positive לפעמים — למשל כשנתיב סטנדרטי כמו package.json באמת קיים. אנחנו פותרים את זה עם רשימת allow קצרה (package.json, README.md, tsconfig.json). כל שאר הנתיבים חייבים להופיע בתוצאה קודמת.

LLMs fabricate paths. They see a Node project, assume "there's probably a config/database.js", and call read_file('config/database.js') which returns ENOENT. They do not get confused by this — they just fabricate the next path.

The rule: any tool call that takes a path must come after a list_directory or find_files result that mentioned that path. If the agent calls read_file('foo.js') without seeing the path in an earlier tool result this turn, the call is rejected.

function pathWasObserved(filePath, history) {
  return history.some(call => {
    if (!call.result?.ok) return false;
    const blob = JSON.stringify(call.result);
    return blob.includes(filePath);
  });
}

async function readFile(args, ctx, history) {
  if (!pathWasObserved(args.path, history)) {
    return toolError('unverified_path',
      `Path '${args.path}' was not observed in any earlier tool result. ` +
      `Call list_directory or find_files first to confirm it exists.`);
  }
  return fs.promises.readFile(args.path, 'utf8');
}

This is deliberately aggressive. It produces false positives — for example, standard paths like package.json that genuinely exist. We patch that with a short allow-list (package.json, README.md, tsconfig.json). Anything else must come from a prior tool result.

כלל #4: הצהר כוונה לפני שינוי משאב משותףRule #4: declare intent before mutating shared state

כש-tool משנה משאב שיכול לראות אותו עוד מישהו — DB, קובץ ב-shared volume, message broker — הסוכן חייב להצהיר על הכוונה לפני הביצוע. ההצהרה היא הודעת assistant גלויה שאומרת מה הוא עומד לעשות, ולמה. בלי זה, אם הוא טועה, אין מה לבקר.

אצלנו זה הופיע אחרי תקרית ב-routes/client.js שבה ה-Runtime Rescue הישן יצר ghost projects. הכלל החדש: לפני build_and_deploy, הסוכן חייב לכתוב הודעה למשתמש בנוסח "אני עומד לעדכן את הפרויקט הקיים prj_8f2a1c" או "אני עומד ליצור פרויקט חדש בשם 'X' כי הצהרת על confirm_new_project=true".

function lastAssistantMessage(history) {
  return [...history].reverse().find(t => t.role === 'assistant')?.text || '';
}

async function buildAndDeploy(args, ctx, history) {
  const intent = lastAssistantMessage(history);
  const declaresProject = intent.includes(args.project_id) ||
    (args.confirm_new_project && /create.*new project/i.test(intent));
  if (!declaresProject) {
    return toolError('intent_not_declared',
      'You must state which project this build targets before calling build_and_deploy.');
  }
  return runBuild(args, ctx);
}

ההצהרה לא רק עוזרת לבקרה אנושית. היא גם מחייבת את ה-LLM להעלות את ההחלטה לתודעה - וברגע שהוא כותב אותה, הוא לעיתים תופס את עצמו טועה לפני שהוא לוחץ.

When a tool mutates state that anyone else can observe — DB, file on a shared volume, message broker — the agent must declare its intent before executing. The declaration is a visible assistant message stating what it is about to do, and why. Without it, if the agent is wrong, there is nothing to review.

For us this came after the routes/client.js incident where the old Runtime Rescue created ghost projects. The new rule: before build_and_deploy, the agent must produce a user-visible message along the lines of "I'm about to update existing project prj_8f2a1c" or "I'm about to create a new project named 'X' because confirm_new_project=true was set".

function lastAssistantMessage(history) {
  return [...history].reverse().find(t => t.role === 'assistant')?.text || '';
}

async function buildAndDeploy(args, ctx, history) {
  const intent = lastAssistantMessage(history);
  const declaresProject = intent.includes(args.project_id) ||
    (args.confirm_new_project && /create.*new project/i.test(intent));
  if (!declaresProject) {
    return toolError('intent_not_declared',
      'You must state which project this build targets before calling build_and_deploy.');
  }
  return runBuild(args, ctx);
}

The declaration helps human review, but it also forces the LLM to surface the decision — and when it writes it out, it sometimes catches itself being wrong before it commits.

כלל #5: שאל כשיש דו-משמעות, אל תנחשRule #5: ask when ambiguous, do not guess

הכלל הזה הוא ההפך מאינסטינקט. סוכנים אומנו לעזור, וכשהם פוגשים בקשה דו-משמעית הם נוטים לבחור פירוש ולרוץ איתו. אצלנו זה היה הסיבה לתקרית הראשונה של delete_project: "בוא ננקה ישנים" קיבל פירוש מבצעי במקום לעצור ולשאול.

אכיפה כאן יותר קשה — אי אפשר לבדוק "האם זה היה דו-משמעי". אבל אפשר לדרוש שכל פעולה ב-risk≥3 תכלול ב-args שדה user_quote: ציטוט מילולי מהמשתמש שמגדיר את הפעולה בבירור.

async function deleteProject(args, ctx) {
  if (!args.user_quote || args.user_quote.length < 8) {
    return toolError('missing_user_quote',
      'delete_project requires user_quote: a verbatim quote from the user that authorizes this specific deletion.');
  }
  const project = await db.getProject(args.project_id);
  if (!quoteMentions(args.user_quote, project.name)) {
    return toolError('quote_does_not_match',
      `user_quote does not reference project '${project.name}'. ` +
      `Ask the user to confirm the deletion of this specific project.`);
  }
  return runDelete(args, ctx);
}

זה מאלץ את הסוכן לעצור ולחפש איפה המשתמש באמת אישר את הפעולה. אם הוא לא מוצא ציטוט שמזכיר את שם הפרויקט, התשובה היא לא "לדלג על האכיפה" — אלא "לשאול את המשתמש לאישור מפורש".

Winאחרי שהוספנו את user_quote ל-tools המסוכנים, מספר התקריות של פעולות שלא הותרו ירד מ-3-4 בחודש לאפס במשך 90 הימים האחרונים.

This rule cuts against instinct. Agents are trained to be helpful; faced with an ambiguous request, they pick an interpretation and run with it. That was the cause of our first delete_project incident: "clean up old ones" got an operational reading instead of a clarifying question.

Enforcement here is harder — you cannot check "was this ambiguous". But you can demand that every risk≥3 action carry a user_quote field in its args: a verbatim quote from the user that explicitly authorizes this specific action.

async function deleteProject(args, ctx) {
  if (!args.user_quote || args.user_quote.length < 8) {
    return toolError('missing_user_quote',
      'delete_project requires user_quote: a verbatim quote from the user that authorizes this specific deletion.');
  }
  const project = await db.getProject(args.project_id);
  if (!quoteMentions(args.user_quote, project.name)) {
    return toolError('quote_does_not_match',
      `user_quote does not reference project '${project.name}'. ` +
      `Ask the user to confirm the deletion of this specific project.`);
  }
  return runDelete(args, ctx);
}

This forces the agent to stop and find where the user actually authorized the action. If it cannot find a quote naming the project, the answer is not "skip enforcement" — it is "ask the user for explicit confirmation".

WinAfter we added user_quote to dangerous tools, unauthorized-action incidents went from 3–4 per month to zero across the last 90 days.

Anti-pattern: הכל ב-system promptAnti-pattern: everything in the system prompt

הפיתוי הוא לכתוב את חמשת הכללים כפסקה ב-system prompt ולהמשיך הלאה. אנחנו עשינו את זה ב-2025. ראינו שהפרומפט גדל ל-3,400 טוקנים, ושהציות לכלל "שאל כשדו-משמעי" צנח לפי 40% תחת עומס — כי כשהקונטקסט מתמלא, ההוראות הראשונות מאבדות עוצמה.

הלקח: פרומפט מסביר. קוד אוכף. חמשת הכללים נשארים בפרומפט בקיצור (ארבע שורות, לא עשרים), אבל ה-runtime היחיד שמחליט אם ה-tool רץ הוא הקוד.

היתרון השני של אכיפה בקוד: היא ניתנת ל-test. אנחנו כותבים unit tests על ה-rules עצמם:

test('rejects identical tool calls in a row', async () => {
  const history = [{ name: 'read_file', args: { path: 'a.js' }, result: { ok: true } }];
  const result = await executeTool('read_file', { path: 'a.js' }, ctx, history);
  expect(result.error).toBe('duplicate_call');
});

test('allows identical args after another tool intervenes', async () => {
  const history = [
    { name: 'read_file', args: { path: 'a.js' }, result: { ok: true } },
    { name: 'list_directory', args: { path: '.' }, result: { ok: true } },
  ];
  const result = await executeTool('read_file', { path: 'a.js' }, ctx, history);
  expect(result.ok).toBe(true);
});

כללים שיש להם tests הם הכללים שלא נשברים בעדכון הבא. כללי-prompt לבד נשברים בכל refactor.

Noteאם אי אפשר לכתוב unit test לכלל, הוא לא באמת כלל. הוא tag line.

The temptation is to write the five rules as a paragraph in the system prompt and move on. We did that in 2025. We watched the prompt balloon past 3,400 tokens and saw compliance with "ask when ambiguous" drop by ~40% under load — once context fills up, the early instructions lose force.

The lesson: prompts explain. Code enforces. The five rules stay in the prompt in short form (four lines, not twenty), but the only runtime that decides whether a tool fires is the code path.

The second benefit of code-level enforcement: it is testable. We write unit tests against the rules themselves:

test('rejects identical tool calls in a row', async () => {
  const history = [{ name: 'read_file', args: { path: 'a.js' }, result: { ok: true } }];
  const result = await executeTool('read_file', { path: 'a.js' }, ctx, history);
  expect(result.error).toBe('duplicate_call');
});

test('allows identical args after another tool intervenes', async () => {
  const history = [
    { name: 'read_file', args: { path: 'a.js' }, result: { ok: true } },
    { name: 'list_directory', args: { path: '.' }, result: { ok: true } },
  ];
  const result = await executeTool('read_file', { path: 'a.js' }, ctx, history);
  expect(result.ok).toBe(true);
});

Rules that have tests are the rules that survive the next refactor. Prompt-only rules break the moment the prompt gets edited.

NoteIf you cannot write a unit test for a rule, it is not a rule. It is a tagline.