{
"cells": [
{
"cell_type": "markdown",
"id": "7a969399",
"metadata": {
"papermill": {
"duration": 0.008287,
"end_time": "2026-05-27T03:49:14.321358+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:14.313071+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"# 02 · Tool Use — give the agent access to external tools\n",
"\n",
"> **TL;DR.** Bind one or more *tools* (Python functions exposed to the LLM via a JSON schema) to the model. The LLM decides when to call a tool, reads the tool's result, and either calls another tool or produces a final answer. The simplest \"agentic\" pattern that goes beyond a single forward pass.\n",
">\n",
"> **Reach for it when** the answer requires information the model can't have memorized: live data (weather, stock prices, news), private data (your company's docs), or deterministic computation (math, code execution).\n",
"> **Avoid when** the answer is in the model's training data and structured output would do — calling a tool you don't need adds latency and one more failure point.\n",
"\n",
"| Property | Value |\n",
"|---|---|\n",
"| Origin | OpenAI function-calling API, June 2023 — conceptual ancestor: Toolformer (Schick et al., 2023) |\n",
"| Reasoning type | Reactive (no explicit *thought* step — see ReAct, notebook 03, for that) |\n",
"| External tools needed? | **Yes** (web search by default) |\n",
"| Memory across episodes? | No |\n",
"| Provider requirement | Must support **tool calling** (Nebius, OpenAI, Anthropic, Groq, Together, Fireworks, Mistral, Google, recent Ollama) |\n",
"| Typical tool calls | 1–4 per task |\n",
"\n",
"This notebook keeps the original scenario (research assistant doing live web queries) but rebuilds the implementation on top of the library's `ToolUse` class."
]
},
{
"cell_type": "markdown",
"id": "5c313a5c",
"metadata": {
"papermill": {
"duration": 0.008037,
"end_time": "2026-05-27T03:49:14.329395+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:14.321358+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 2 · Architecture at a glance\n",
"\n",
"```mermaid\n",
"flowchart LR\n",
" A([user task]) --> AG[Agent
LLM bound with tools]\n",
" AG --> Q{tool_calls
present?}\n",
" Q -->|yes| T[ToolNode
executes the called tools]\n",
" T --> AG\n",
" Q -->|no| F([final answer])\n",
"\n",
" style AG fill:#e3f2fd,stroke:#1976d2\n",
" style T fill:#fff3e0,stroke:#f57c00\n",
"```\n",
"\n",
"The graph has only two nodes: an **Agent** (the LLM with `bind_tools(...)`) and a **ToolNode** (LangGraph's prebuilt that calls the requested tools in parallel). `tools_condition` — also a LangGraph prebuilt — inspects the latest message and routes to `tools` if there are pending tool calls, else to `END`."
]
},
{
"cell_type": "markdown",
"id": "1b44cb13",
"metadata": {
"papermill": {
"duration": 0.0,
"end_time": "2026-05-27T03:49:14.337800+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:14.337800+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 3 · Theory\n",
"\n",
"### 3.1 · The shift from \"completion\" to \"tool-augmented agent\"\n",
"\n",
"Before tool-calling (mid-2023), the LLM produced a single chunk of text in response to a prompt. Anything that required *up-to-date facts*, *private data*, or *arithmetic the model can't do reliably* had to be handled with awkward workarounds — ReAct-style prompting, plugins, or downstream scaffolding that parsed natural-language pseudo-commands out of the output.\n",
"\n",
"OpenAI's function-calling API changed the game by training models to emit a **structured `tool_calls` field** rather than embedding tool invocations in free text. Now the LLM produces:\n",
"\n",
"```json\n",
"{\n",
" \"tool_calls\": [\n",
" {\"name\": \"web_search\", \"args\": {\"query\": \"LangGraph release date\"}}\n",
" ]\n",
"}\n",
"```\n",
"\n",
"— which downstream code can execute deterministically. After execution, the tool's result is appended to the conversation as a `ToolMessage`, the LLM is invoked again, and it sees both the original request and the tool output before deciding what to do next.\n",
"\n",
"### 3.2 · The minimal control loop\n",
"\n",
"```python\n",
"while True:\n",
" response = llm_with_tools.invoke(messages)\n",
" messages.append(response)\n",
" if not response.tool_calls:\n",
" return response.content # final answer\n",
" for tc in response.tool_calls:\n",
" result = run_tool(tc.name, tc.args)\n",
" messages.append(ToolMessage(content=result, tool_call_id=tc.id))\n",
"```\n",
"\n",
"LangGraph's `StateGraph + ToolNode + tools_condition` is exactly this loop, expressed as a graph so it's stoppable, observable in LangSmith, and replaceable with a more elaborate routing strategy when you grow into ReAct (notebook 03), Planning (notebook 04), or PEV (notebook 06).\n",
"\n",
"### 3.3 · Where Tool Use sits in the taxonomy\n",
"\n",
"| Pattern | Loop body | Thought step? | Plan ahead? | Use this when... |\n",
"|---|---|---|---|---|\n",
"| **Tool Use** *(this notebook)* | act → observe | no | no | a single query benefits from one or two external calls |\n",
"| ReAct (nb 03) | think → act → observe | **yes** | no | multi-step reasoning needs intermediate thoughts to stay coherent |\n",
"| Planning (nb 04) | plan once → execute step-by-step | no | **yes** | the task naturally decomposes into a fixed sequence |\n",
"| PEV (nb 06) | plan → exec → verify → maybe replan | no | yes + verification | actions can fail and you need automatic recovery |\n",
"| Agentic RAG (nb 23) | decide-to-retrieve → retrieve → answer | yes | no | the agent owns *when* to retrieve, not just *what* |\n",
"\n",
"Tool Use is the \"single forward step\" version of ReAct. If you find yourself wanting the agent to write a *because-of-this* sentence between tool calls, you've grown into ReAct.\n",
"\n",
"### 3.4 · The three failure modes you'll see in § 9\n",
"\n",
"1. **Over-search** — the agent keeps calling the search tool even after it has enough information. Fix: a system prompt that explicitly tells it to stop after 2–3 calls (we use this).\n",
"2. **Result drift** — the agent searches, gets a relevant result, then *answers from parametric knowledge anyway*, ignoring the result. We'll see this live in the captured run.\n",
"3. **Bad query selection** — the agent issues queries that are too vague to be useful (\"information about X\" instead of \"X release date 2024\"). Tool Use has no built-in fix; ReAct's *thought* step is the standard upgrade.\n"
]
},
{
"cell_type": "markdown",
"id": "b0620cfb",
"metadata": {
"papermill": {
"duration": 0.003357,
"end_time": "2026-05-27T03:49:14.349705+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:14.346348+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 4 · Setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8c6323b0",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-27T03:49:14.362602Z",
"iopub.status.busy": "2026-05-27T03:49:14.362602Z",
"iopub.status.idle": "2026-05-27T03:49:15.586048Z",
"shell.execute_reply": "2026-05-27T03:49:15.585720Z"
},
"papermill": {
"duration": 1.230961,
"end_time": "2026-05-27T03:49:15.586048+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:14.355087+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
Provider: nebius · Model: meta-llama/Llama-3.3-70B-Instruct ─────────────────────────────────────────────────────\n", "\n" ], "text/plain": [ "\u001b[1;36mProvider: nebius · Model: meta-llama/Llama-\u001b[0m\u001b[1;36m3.3\u001b[0m\u001b[1;36m-70B-Instruct\u001b[0m \u001b[92m─────────────────────────────────────────────────────\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
LangSmith tracing: enabled \n",
"\n"
],
"text/plain": [
"LangSmith tracing: enabled \n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Tavily key set: True \n",
"\n"
],
"text/plain": [
"Tavily key set: \u001b[1mTrue\u001b[0m \n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from agentic_architectures import get_llm, enable_langsmith, settings\n",
"from agentic_architectures.architectures import ToolUse\n",
"from agentic_architectures.ui import print_md, print_header, print_step\n",
"\n",
"traced = enable_langsmith()\n",
"print_header(f\"Provider: {settings.llm_provider} · Model: {settings.llm_model}\")\n",
"print_md(f\"LangSmith tracing: {'enabled' if traced else 'disabled (no LANGSMITH_API_KEY)'}\")\n",
"print_md(f\"Tavily key set: **{settings.tavily_api_key is not None}**\")"
]
},
{
"cell_type": "markdown",
"id": "7e0e03c5",
"metadata": {
"papermill": {
"duration": 0.007779,
"end_time": "2026-05-27T03:49:15.593827+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:15.586048+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 5 · Library walkthrough\n",
"\n",
"Source: [`src/agentic_architectures/architectures/tool_use.py`](../src/agentic_architectures/architectures/tool_use.py).\n",
"\n",
"The class is short — most of the heavy lifting is done by LangGraph prebuilts (`ToolNode`, `tools_condition`) and the library's `web_search_tool` wrapper around `langchain_tavily.TavilySearch` (which replaces the deprecated `TavilySearchResults` from the original repo).\n",
"\n",
"Key design choices:\n",
"\n",
"1. **`provider_supports_tools()` check at construction time.** Fails fast with a helpful error if you try Tool Use on a provider that doesn't support tool-calling (e.g., `huggingface`).\n",
"2. **Default system prompt** caps over-search (\"after at most 2–3 searches, STOP and answer\"). The original `bind_tools(...)` call alone produces 6+ searches per question for chatty models like Llama 3.3 — see § 9.\n",
"3. **Single `_agent` node** that prepends the system message on first turn only. `add_messages` reducer takes care of the rest.\n",
"4. **`recursion_limit = 4 × max_rounds + 4`** — LangGraph counts edge traversals, so we budget generously."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0b57492f",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-27T03:49:15.601932Z",
"iopub.status.busy": "2026-05-27T03:49:15.601932Z",
"iopub.status.idle": "2026-05-27T03:49:15.626145Z",
"shell.execute_reply": "2026-05-27T03:49:15.626145Z"
},
"papermill": {
"duration": 0.024213,
"end_time": "2026-05-27T03:49:15.626145+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:15.601932+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"def __init__(tools, max_rounds, system_prompt):\n",
"def _agent(state):\n",
"def build():\n",
"def run(task):\n",
"\n",
"Default system prompt:\n",
"You are a research assistant with access to web search.\n",
"\n",
"Rules:\n",
"1. Use the search tool only when you need facts you don't already know.\n",
"2. After at most 2-3 searches, STOP searching and answer using what you found.\n",
"3. Cite your sources with URLs in the final answer.\n",
"4. If a search returns enough information, do NOT search again - answer the user.\n"
]
}
],
"source": [
"import inspect, ast\n",
"from agentic_architectures.architectures import tool_use as tu_mod\n",
"\n",
"src = inspect.getsource(tu_mod.ToolUse)\n",
"tree = ast.parse(src)\n",
"for node in ast.walk(tree):\n",
" if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):\n",
" args = ', '.join(a.arg for a in node.args.args if a.arg != 'self')\n",
" print(f\"def {node.name}({args}):\")\n",
"print()\n",
"print('Default system prompt:')\n",
"print(tu_mod.ToolUse.DEFAULT_SYSTEM_PROMPT)"
]
},
{
"cell_type": "markdown",
"id": "4c5ff4a8",
"metadata": {
"papermill": {
"duration": 0.004012,
"end_time": "2026-05-27T03:49:15.636203+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:15.632191+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 6 · State & messages\n",
"\n",
"`ToolUseState` has just one field — the message list. LangGraph's `add_messages` reducer means each node returns *the messages it produced* (not a replacement list); the reducer appends in order. That's why both `_agent` and `ToolNode` can return `{'messages': [...]}` without colliding.\n",
"\n",
"| Message type | Produced by | Contains |\n",
"|---|---|---|\n",
"| `SystemMessage` | first call to `_agent` | the cap-search instruction |\n",
"| `HumanMessage` | the caller | the task |\n",
"| `AIMessage` (with `tool_calls`) | `_agent`, mid-loop | the next tool(s) to call |\n",
"| `ToolMessage` | `ToolNode` | the tool's stringified result |\n",
"| `AIMessage` (no `tool_calls`) | `_agent`, terminal | the final answer |"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b3de3b18",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-27T03:49:15.641959Z",
"iopub.status.busy": "2026-05-27T03:49:15.641959Z",
"iopub.status.idle": "2026-05-27T03:49:15.723274Z",
"shell.execute_reply": "2026-05-27T03:49:15.721478Z"
},
"papermill": {
"duration": 0.087643,
"end_time": "2026-05-27T03:49:15.725850+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:15.638207+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ToolUseState fields: ['messages']\n"
]
}
],
"source": [
"from agentic_architectures.architectures.tool_use import ToolUseState\n",
"print('ToolUseState fields:', list(ToolUseState.__annotations__.keys()))"
]
},
{
"cell_type": "markdown",
"id": "0d016516",
"metadata": {
"papermill": {
"duration": 0.005791,
"end_time": "2026-05-27T03:49:15.737488+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:15.731697+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 7 · Build the graph\n",
"\n",
"The cell below renders the **actual compiled `StateGraph`** as a PNG (via `mermaid.ink`). If this rendered diagram ever disagrees with the static one in § 2, the implementation has drifted from the documentation."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "5ff9a5bc",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-27T03:49:15.748250Z",
"iopub.status.busy": "2026-05-27T03:49:15.748250Z",
"iopub.status.idle": "2026-05-27T03:49:18.302352Z",
"shell.execute_reply": "2026-05-27T03:49:18.300990Z"
},
"papermill": {
"duration": 2.563533,
"end_time": "2026-05-27T03:49:18.303304+00:00",
"exception": false,
"start_time": "2026-05-27T03:49:15.739771+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAANgAAAD5CAIAAADKsmwpAAAQAElEQVR4nOydB2AURdvHZ/dKLr33QhICAUKJvAEUFRAQ9aUrrwgEAT+kCeInRf0AQSwgiIqKYASESInSW6QoTQidlxJKEJKQkEp6LuXa7vfsXXI5krtAILc3ezc/w7k7M7uX7P1vyvPMPCNmWRYRCJZGjAgEDCBCJGABESIBC4gQCVhAhEjAAiJEAhYQIdYn767y2pnS4hylopoBNEqEaIQYRNEsy1CIAmsXhVjEUtx/ukTuDHJoitGw8H/dfXQFuAOapRjtAaRxt+KO4YBl6r91TSLNcmUo7gJtKnedYTFazDLqB1Kk9pREQts5ifzD7WP6uCEBQhE7oo6sfxRHtuaXFigYhhWJaZmDSGJHi8RIrWAoEWI1NXLkXlntj04o2kROfZxyKFbDIqr+nWkQKKN9yJRWx1r90WKKUdc9+VpZUyyUFFFIwym75pOhKcQ88BnVuxaQymiNhlJVaxRVjErN2MlE/mGyAeP9kXAgQkR5GardcfdU1Yyrl7Tjs64dnndBgkaDjmwtSLsur67U+ATLhr0biISArQtxyzdZefeqQiKdBk3wQ9ZFQa5m38+ZleWaF4b5tenqiPDGpoX485xUqUQ0ZkELZL1cP1V+fMf9oEiHAf+D9TfNdoW4Zl5aYEvHl8f6IBtg9dz0Lv3cO/VwRbhio0Jc9cGdiGiXviO8kc2wek6ad7D94EmY1os0sj3Wzk9v0cbRplQIjP88LD+j8u/tBQhLbE6Iu3/KAXPdK+OsbWjyKLz9WfiVkyUIS2xMiBqUeati3PxQZJvQKLiV49oF6Qg/bEuI8YszvIPskQ0zaJK/sopJuVCBMMO2hFhWqHx9ujAMvObDr4Usafd9hBk2JMQ9cTkOzhKe/+IPP/xw165dqOm8+OKLWVlZyAz0Hx9QUa5GmGFDQsxJqw6J5Ltdvn79Omo6OTk5xcXFyDxIpAjc6H9uxqtStCEhqlRMTB9PZB5Onjw5ceLE5557bsiQIfPnzy8o4KwkMTEx2dnZn376aa9eveBULpevWrVqzJgxumLffPNNdXW17vI+ffps3rz57bffhkuOHTs2cOBASBw8ePCMGTOQGXD3kWanViKcsBUh3rlSSVPIzVeEzMDNmzenT5/epUuXrVu3zp49+9atWwsWLEBadcLrvHnzjh49CgcJCQnr1q0bPXr0t99+C+UPHToUFxenu4NEItmxY0dkZOSKFSueffZZKACJ0KYvW7YMmQH/UHtlJYNwwlbmI+akVYkkFDIPly5dkslkb731Fk3Tfn5+7dq1u337dsNisbGxUPOFhYXpTi9fvpyUlPTuu+8iboIY5erqOnPmTMQLPsHS5FNEiJagSq6haHMJMTo6GhrZ9957r1u3bj169AgODoYWtmExqPZOnToFDTdUmWo1N1zw8PDQ54J8EV94eNuxDF6uXVtpmrmpqWbzqrdp0+a7777z9vb+/vvvhw4dOmXKFKjtGhaDXGiLocDOnTvPnz8/btw4w1ypVIp4QyxCyFxfy8fDVoTo4CQ266Pv3r079AX37NkDvcPS0lKoHXV1nh6WZbdt2zZ8+HAQIjTfkFJeXo4sREl+NcIMWxGid6BUWa1B5uHChQvQ2+Pexdt7wIABMNQFkYEJxrCMSqWqqqry8amZdaZUKo8fP44sRF6Gghbh9dHbihDbdHWGlllZaZbWGRpiGCxv374djH/JyckwOgZF+vv729nZgfJOnz4NDTGMY0JDQ3fv3n3v3r2SkpKFCxdCz7KsrKyiwoi3DUrCKwyr4W7IDOTdrZY5EiFaCImUPn2gCJkBGA5Dg/vVV1+BO2TChAmOjo7QFxSLuYEgDKXPnTsHdSRUh1988QUMrocNGwZGxK5du06dOhVO+/btC7bGejcMCgoCUyIYHaFbicxAYU61b6AM4YQNTYz9bdm9yjL1uE9Ckc3z/f/+M35huL2zWayqj4cN1YgvjvKVl2HnY+WfxF9ywMWHlQqRTS2w9/CTyBzonSuzh0wOMFpAo9GAwdloFowtwAoIZueGWeHh4WvXrkXmYZ0Wo1lOTk7gMzSaFRUVBR4aZIK05IrOvd0RZtjWmpV7/yh2rcp8Z1mEqQINu2s64COHD95oFvQF9WPhZqdci9EsMKFDF9NoFnxnYLRkNOvgxvy0ZPnEReEIM2xu8dSmJRmMho39yJqXkDbCihm3X50S4t+SR+P5o2Fza1ZGzg6pKFWfSTTXJCuc+WVBelArRwxViGxzFd/ExS0vHC4qy7etpmDzkntgwBo8CdOAOLa7wH7FzDt93wiIjLGJJSzxn2Z4BEhxDvZg0yFHfpyd6h8sGzotAFk1a+alyRxFoz4MQRhj60GY1nycplIwT7/iGd1LkGEFG2fHj9nZdypbRbv0G417ZBUSlg6d3FV45UQJ2HgDW9q/HOtHSZDQSbtaeXp/QXGeysFFPHZOC4SX6do4RIg1/L298OaFMkWVhqKRo4vYyU1q7yASSRiVsu75iMQijbpmCo8uqCaljdupfYT6CK9IIqZUtYE09fE2DaNr1gTkhEQR9/xrQsfW3gByaRpp1Ky+mD68rFhMq9WM/lR/ALZ2jZoCB6ZcrqmWq+E+zh6SXsO8g1oJpgdMhFifE7sKs1OrKss0KiUDz0ZjEJuVplmGqXGu6BSmc7VoD+BB1mSJJEijqr2mVl60GDFa/yLDMDRnq9D+o7i4xPU/AQreCDEapHsHbUpN9GLdTerei2IQy91HasdFm7WzF7l4iltHu0R2wT0aYkOIEPlm2rRpI0eOfOaZZxDBABLMnW/UarVuhhjBEPJE+IYI0SjkifANEaJRyBPhG5VKJZEI30TU3BAh8g2pEY1CngjfECEahTwRviFCNAp5InwDQiR9xIYQIfINqRGNQp4I3xAhGoU8Eb4hQjQKeSJ8Q4RoFPJE+AYM2kSIDSFPhFe4yYcMIxIJYaoqvxAh8gppl01BHgqvECGagjwUXiEzHkxBhMgrpEY0BXkovEKEaAryUHiFCNEU5KHwChGiKchD4RUyWDEFESKvkBrRFOSh8I2pWK42DhEir4BzLzc3FxEaQITIK9Au19sajaCDCJFXiBBNQYTIK0SIpiBC5BUiRFMQIfIKEaIpiBB5hQjRFESIvEKEaAoiRF4hQjQFESKvgBA1Gg0iNMAWd56yLOBcIVpsCBEi35DW2ShEiHxDhGgU0kfkGyJEoxAh8g0RolGIEPmGCNEoRIh8Q4RoFLLzFE9ER0fTdM3QEJ45HMPrgAEDFi5ciAhk1MwbHTt2RNxufhxgSqQoyt/fPzY2FhG0ECHyxJtvvuno+MBejZ06dWrdujUiaCFC5Im+ffsays7T03PEiBGIUAsRIn+MHTvWxcVFd9ymTZsOHTogQi1EiPzx/PPPR0ZGwoGrq+uoUaMQwQAyam6ABh3fXVxRplQrNTCo0Gi45yOWiuCU0u45DynwymjTRRJao+I2kdeXpGmK23WeGxdz24rrpjfQIm4jcKCkpORK8iUXJzcYRHNXiZGm1pJD0dy1+n3K9Zfoc7nNwg0+LLGEUqse+Oyk9mK/YPtOPZ2RACFCfIDfl2UV5FVLpCKWYTUqlhJRrE5wUqRRcnvLc4IAfYg4vSKtLnWK1G9QzxXglEghWrvVvEa71XxteYClGcSARLl0SkyxtcqjuMaJZRmqppzBJQ/cthaRBGlUD/zyUhnomLMN9RnuF/GUAxIUxKBdx65V2ZVlzOg5LZGQuXNJ/mdCHi31DY8SkhZJjVjD9uXZlXLN4KnByCrY8Hlq7KxwZ+FENyGDlRpy71X3GRWErAUvP9meNZlIOBAhciT/XQ7jBid3ClkL/uEOFWVC8miTPiIHNMqMClkTMkdKpRTSggQiRA41o9YwVtVXhp4/wyABQYRIwAIiRA6aAhDBghAhcoCtmEXWZcaiOD8NEg5EiBzgPNP+syK4PqKQvlpEiAQsIELkgEaM9BEtCxEiB8sgxrpcnQzFCuurRYTIwXKTYayqSqRZSljfLCJEDmK9sThEiFq42oPMQrIkRIgcDAUytKo6UWgtMxGiFsoKa0OBdTbINDAObuY+xmLcsfP3RV/Ob9IlgvtqkRqRg3PwYexZSUm5jqwdIsTHRC6Xb9m64ey5U+npdzw9vLp37/nWuMkymQyyGIZZ/t2XJ04elUqkffq83D6q00dz3tu25YCHh6darV6z9sfTZ07k5+e2bx89dPDrTz/9nO6GQ17tO27spNLSkvXxcfb29l1inpn6zkxPT6/33p9w+fJFKHDw4L49u446OTkha4Q0zRwURTe1S7V9R8KmzeuGvz76i8+/nThx+tFjh0BAuqwtWzfu2bt92tRZq1ZtsLd3AOUhbdQbeP3u+yVbt20aOmT4po17evboM/+T2ceO/6W7SiKR/PZbPBTbueOv9b9su5p8ad36nyD926/j2rZt369f/yN/nW+SCoXVRyQ1IgfLNtmx8vp/YkFJLVqE6U6Tky+fPZc0ccK7cHzg4N4ez/fu1bMvHI8aOQ7SdWUUCgVkjRwxdtDA1+D0368Mhqvif/0Z7qMrEBgYHDvqLe7IyRlqxFu3bqAnQFizOIgQHxOowM6dP7X4y/m379zSxTt0d/eAV41Gk56e+srLg/Qlezzf58qV/8IBCEupVILC9FnRnf71x/7dpWWlri6ucNq6dVt9lrOzS0WFHD0JxLMiOLjF7U0cZ8b9/H1i4k5olEFYvr5+q9esSPxjF6TLK+Qsyzo41AX+cnV10x3I5eXwOm36/9S7VXFRoU6ItuzfIULkYJmmtWQgtT17tw17beSA/kN1KTqRAQ723LJ2lapuLVZxcaHuwNOLW2Y84/050AQb3s3Hxw/ZPESIHNxYhW5CjQjtb1VVlZeXj+4UGtykU8d1x9Bk+/j4wlBaX/hk0jHdQVBgiJ2dHRw8FR2jSykuLtJWn80fkkEbnURI9SsZNXNwYxWmCR+bWCwOCQmF7l1W9j0wuCz5amGH9tHl5WUVFRWQ2/2ZHgcP7Tt3/jSIDEbQkK67CgQ3dsxEGJ1cvXoJtAvj5Zmzp3y7fPFD3w5q0Bs3ki/+95xhRds4XKwcQRm1iRAfk3lzvpDZycaOGxb75pB/de46fvxUOB36Wt+c3Owxb07o0OGp2R9MHf3m0Lt306AFR5x2JfD6xvA3Z838eFPCuoGDe4GtMcA/aMaMuQ99r4H9X4Xu46zZ71RWVqBHhBHYYIXEvuE4tbfgwuHSMfObJ/xSdXU12KuhytSdJvwWv3Hj2j27jyIeuXmm9Mz++1O/jkACgdSIHM1rcgPlTZg0atv2BGi1Dx85+PuWDYMGDUOERiGDFQ5uYmzzfSXHjplQWlp88ODen1d/7+3tC34UMGsjfmG4wQpZxSc0oH/CNmuAjunvfoAsCs0NVsi6ZqFBN2uNiAVCG6wQIXIwzV0jEpoKESKHNc7QFhhEiFq47hQRoyUhQtRCWduEA848TAYrgoNF1hYNjPteCcpVQYSoDYFbmAAAEABJREFUhbW6aGBCgwiRg2jQ4hAhaiHRwCwNESKHdjkpIlgQIkQOqVQskVlXlUgjiUSEhAOZfcMR1NKBEdLuOA+nJEclrK8WESKHX7hUIqXP/VGErIV7d+QB4ULaFJIIsYZXxgSkXCxGVsH+X3JYhn15jA8SDmSGdg1VVVXvT5/TwfUdD19ZWFsXO0dWbWoaBMX5ptkHErhTmkIMixr6CrnCVL0bcLNjWLruWlSz3En/vxq427LGTZz6C/UHYlpUmKPMSCmTOYhGzBbYBpdEiDX8+uuvUVFRndt3TlieWV6kVqoZRl33ZPR+Cr0kGj41XRlDj0atB1ubbJiu/VfvwevsRywXDKpuDoa+ZD3ZcWsOmZpi+jtL7CiJRKwS5XV4UdWqVSsfH1IjCoeioqLly5d/8skniC+mT58+fPjw7t27IzOwZs2auDguhpOzs7OLi0tISEinTp1at27duXNnhDe2br6ZO3cuKAPxiJeXl6OjIzIPo0aN2rdvX0ZGhlwuz8rKunnz5qFDh9zc3OAdd+3ahTDGRmvE3NzcM2fODB48GFkdq1atWr16db1E+JQvXLiAMMYWR82lpaXjx49/+umnkSWA74BCoUBmY9iwYYGBgYYpdnZ2mKsQ2ZoQc3JyoMFSq9V79+719fVFluCDDz64ffs2MhvQ9D/33HP6hg4OFi1ahLDHhoR4+fLlCRMmwOfk6emJLAd8AcwR7MaQESNGeHtzAZ90LfLOnTtXrlyJ8MYmhJiXl4e0cTL37NmjC4NkQZYsWRIWFobMSVBQUExMDMMwfn5cnLGvv/5aKpVOmzYNYYz1D1ZgtHj48GGw0SA8gL4BVIpisdntFf369Tt48KD+9NSpU3PmzImPjweZIvyw5hqxrIwLw1VZWYmPCoHJkyfn5+cj82OoQuCZZ56BNnrq1KkHDhxA+GG1Qly7dm1iYiLSdpgQTkBzCQZnZAnAxA1aPH78+DfffIMwwwqbZpVKdf/+fXjiU6ZMQQRjbNq0CborDc2NFsTahAgPF/pGUOtA9xxhCbg9oJem2+3CgoANYdKkSevXrwcHIMIAq2qat27dCjZCcLBiq0IgNja2uroaWRrwQUMbvWDBAmg6EAZYiRC3bNkCr71794ZvOcKbgIAATL4nEokE2ujk5OTPP/8cWRprEOKMGTN0HQwPDw+EPQkJCTzYbh6duXPntmvXbtSoUbrdYiyFsPuI58+fB8stWObqeVdx5u7duy1atECYkZKSMmbMmJ9++gmabGQJhFojKpVK8O7ruvwCUiH0DqHuQfgRGRl5+vTp7777bvPmzcgSCFKIRUVFBQUFy5Ytw3++Zz2g/QkPD0e4smbNmuzsbGisEe8IrGkG/b399ttgrHZ3d0cE87B///64uDiw7Dg7OyO+EJgQt2/f3qVLl+DgYCRMNBpNTk4Ont5eQ8DYCV3GxYsXd+vWDfGCMJrm1NTUd955Bw5effVV4aoQAJcP/gYmAGyxR44ciY+Ph8YH8YIwhAj+ko8//hgJH4qiMBwym2LFihUKhQKsY8j8YN00X7t27cqVK7jNWrA1jh07tmjRIqgdzbo+Fd8aEYbGS5cuHTBgALIiwOoEw1IkKHr27Llhw4axY8devXoVmQ18hQjuh3Xr1vE5cOOBqqqq+fPnC86J4OXllZiYCFZG3Vx3c4CpEDdu3Hj27Flkdbi6uv7444979uxhGOHt63Lp0iXzrTjDdIF9fn6+lYX51yORSAYNGpSZmQluIQH5hP7555+ICDPudYqpEGGAgtXMgGYHjFCDBw/etGmT+aI+NC8gxFatWiGzgWnT7OfnB/0SZNXs2rUrJSVFLpcjIXDnzh2z1oiYCnHHjh27d+9G1g74yrOyspKSkhD2mLtpxlSI4FMGVxiyASIjIxMSEvCvF2/fvm1WIWJq0AZXGIwrLRUVhH/AuAh/L7Y+6NLSUnCu/vXXX8hsYFojent7244KkXb9QHFxsaXmAj4Uc1eHCFshHjhw4LfffkO2RIcOHaBeBIs3wg/bFWJhYaHgXGFPjm7xzcWLFxFmmNt2g7AV4ksvvfTGG28g28PBwUEmk33xxRcIJ6BGNLcQMTUaWzZynGVp167dzZs3EU7YbtN87Nix9evXI1sFhqjwioklFbyRMHY0dzg/TIUI9oKMjAxk28DwZebMmcjS8NBBRNg2zT169BDcCr1mJywsbOzYscjS8NAuI2xrRDc3N/xXGPFA+/bt4dWyUeRsWohnz57FP+wzb0C9aMElV/w0zZgKEXyvaWlpiKDF3d196dKlcKAPT/Pyyy8PHDgQmR+FQpGfn8/DyklMhRgTE6NbP0rQoVsyARbvioqKAQMGFBQUgEuQhyDEPFgQdWAqRBcXFwEtu+SN5cuXv/LKK7m5uUi7/MWssxB0mHv2lx5MhXjt2rVly5YhwoMMHz68srJSd0xRVEpKik6U5oOfkQrCVojwuM26PZMQGTly5J07dwxT8vLywPKPzAk/IxWErRDBzTVr1ixEMEA3YVEkEulTlErloUOHkDkx9woBPZgatB0dHXEO32YREhISLl68eO7cuTNnzoBVIScnx9exM1vmcWj7LX9/v7pylHZzccPpzlDbMNxu5Gyj6yLrtjZnazZILy8vD/V8PvM6lYnK6rYuR4aHhtcbSaVpyifIzivw4aGa8ZqhPX78eHjE8CtB01xWVgZmC6gG4PjPP/9EBAN+WZhaWaqhaKTh7DkP6IvTw4Oio2mkXUXN1itZD70QtRc3KGlwT31JQ+1RtW9gqCexBARGSaRUx2fdu/3bDZkGrxoRWuQNGzbot34AUwXSztZGBAPiPkr1DrEfNtkf4bt3wgNcSyq9erLIP9QupJ3JnY7w6iPGxsY29Ox17doVEWqJ+7/UtjGefUcKRoVAVHfX4bPCEtfnnD9YaqoMXkL08fHp37+/YYqnpyeeQactwh/r88USUXRfVyRA2nZzu3Ss0FQudqPmESNGGFaK0dHRmGyNhAN5GdVe/jIkTDr38VCpWKWJdbPYCRF8KuBF1cUb8fDwGD16NCLUolKoxTIBb40DY6aCPOOrw3D8q/SVYnstiFCLWsmqlSokWBgNy5jYVeiJRs3KKpS0ryDvbnVFmVqlBAsBBe9Ul/3gUB5sDSxj7LS2GEVTLMMdgXWgV4tF6iC1VCRZOTsVmTBcUZzpiar3dnXGMMM/Ugw3p8Ui5OghCWwp6z7AdhfEYMtjCnF/fF7GjQqVgqElYOqnaanIzkkKMmINBFNPPfUkUndae1RnnaKQlH3QRmVMXkblaUKIIlC4RqEuylXlZVRd+KvIwVnSurPz80OIInGhyUL845e81GtykYhy9nYOjBLA3ncN0Sg1964VXTlRcvVkSecX3J7+t2DkCM2IoMNGao3txn//pgnxpw/SoLpp0cHfyce8a7rMikgqavEUGMm9798phdrxxpnycZ+EIiEAnRlB753I9d6Q8d//UQcrGTerfnj/trOPY5teIYJWoSHeLV2j+oYhkeTHmXcQwfxwlaGJ79EjCbH0vnp3XFa73mEB7aywUxXWxc8v0lsQWoRmzUoDOj+CEG9frty45G77F8NoEbJWPIIcw7oEr5iJ/QxIquafQNH2EY1nPVyIB9bnRHQV8K5jj4i9i8irhcdPH2G9YkvofURues7jNc1xc9KdfZ2kTtZbGRrgG+FKi+lNSzMRgXcaE+KxrQVqpSako5UHVTekVfegwmxFTpoSEcwCix6jab52ptQ7XJCWwifBycN+7+osRDADlMmW2bQQk3YXgqfEO9QFYcmlq3/OnNdNXlGMmpuwGD9FpaasAMudoSgLDFWGvNo3/tfVqDl4nD7i9bNlDm4m59NaNyIpvT8e23i1TZPiJws/TPxjF8Iek0KsqtD4Rdhcu6zD2de5MFeBMIRFLGraqDkl5ToSAsZdfDfOyMGtae9mrtno6RlXDh5ZnXnvupOje9vI5/q9MF4m43YCO3l6y6Fjaye/tTI+4aO8/FR/34ge3Ud06VyzU+7e/d+fv5xoJ3V4quNLPl4hyGz4tfQozSpDwueFPjHwuvSrT1eu+mbPrqNwfPLksfXxcXcz0lxd3SIiIqdP+8DXt2YFYCNZOsBytG375gMH9mbeu9siJCwm5um3xk02XN76CLBN6yOm35CLJOZaV1VQmPnTumkqlWLqhNVjRn6Zk/fPyrWTNdrlaCKxpKqqfOe+r14f8n9LF57u2L737zs/Ky7hghkknd2WdHbrq/1nTZ/4i6d7wKEja5DZEEs5B8Y/F4SxOVkj7E88Ca+zZs7TqfD8hTMfL5jVr1//3xMS589bnJeX8+13i3UlG8nSs317woaNa4e9NjJh096BA1/bl7gz4bd41DRMOoaMC7G8WCMWmatbfPHyfrFIMnbEl77eoX4+4f8ZPCcrJyX5Rk3EAo1G9eIL41sEd6AoKia6P3wLs3JuQfqJU793jOoD0nRwcIE6MiI8BpkTWkTnZmLZOj8Ba39Z2eP53qAkqPOiojpOmfz+6dMnbmrb7kay9Fy+cjEyst1LLw1wc3Mf0H/oih/Wdev6LGomjAtRrdKYz6kJ7XJwUDtHx5pVrh7u/p4eQWl3L+kLhARG6Q4c7Lkxe1V1OcixoCjT1ydMXyYooA0yKyxbWYHdXGjO1/wE4+bU1H/atInSn0a2bgevN29eazxLT/v2nS5cOLNk6cL9B/aUlpUGBgRFRDTbciKT7S+DzGW/qKqWZ2ZdB+OLYWJZed36roZT7qoVFQyjsbNz0KdIpWYe0VOUiMJuHQXDck4+9FjI5XKFQmFnV7f2ysGBe56VlRWNZBneAepLBwfHk0nHvlzyiVgs7tXrxYlvv+vl1YRV5424yo0LUWonopC5fJrOzp5hLaJf6j3BMNHRsbElkjI7R5oWqVTV+hSFshKZE6iDZQ7YOTafZPKNTMbprLq6bu1ShVZnnh5ejWQZ3oGmaWiR4Sc9PfXixbPr4uMqKuRffNaUsMqUSYO2cSG6eEoKcszl5grwbXXhcmJ46FP6iA65+aneno2NgqGOdHfzT8+42rO2T3Ij5SQyJwzD+oXhZ0alHn+GNtRhka3bXrt2RZ+iOw5v2aqRLMM7wHi5deu2YWEtQ0PD4adcXr4vcQdqCtwioyYZtFt1ctaozdU0g0WGYZjdf3yjVFbn37+798APy34YmZP3kClYndr3vXr9CDhU4Pjw3/F37yUjs6GUa6AVjOjkgDBDO7G0CS2VnZ2dt7fP+fOn/3vpvFqtHjpk+ImTR7dt21xWXgYpP678uvNTXVpFRELJRrL0/HV4P4ysk5KOQwcRhjJ/nzjcPqoTaiaM14hhHezhDy6/X+3s3fzLuWHYO3PqpiN///rtqjH599NDgqL+M2TOQwcffXuOq6go3pm4bMPvc6BlH/TKe5u2fGymOVF5acViGa4TjppYIY4a+dYv61adPZe0edNesM7cL8j/bcuvP/y4DGyEMf96+u3xUwN8t4UAAAP+SURBVHXFGsnSM+P9uT+s+GrOvPcRt+TcE9ro/wyLRc2EyWhg6z+9q2FF4V38ke2RcjwzIFQ2cKIfwoyVs+8ERti/MDwACZN1C24PnRQYFGmkz2NyYNiph3tVqbUZ0h4RlUKNoQoRsoJ1Aixq6iq+6J4up/bez7lZ5N/GuMe5pDTvqx9GGs2yt3OqUhh3S/h5h0+d8DNqPuZ+3sdUFnhrRCIjf2BoSMfxo02O9W6fyXZylSAsEfLsbB2Uqb+hMT9el5c9z+4vNCVEZyfP96f8ajQLRiFSqfHOJU03s+fQ1O/A/RoqhVRiZMGhWNSYD726TDF5MR/Beh8DWkRRtHWunmpMFjF93K6eKE07nxMWY6SnCJWNh7vlOyvN+zvcOp4Z3MpBhGvoQUbD6qKyWB8PcR6Mm9+iulxRkmNe6zEm3Lt6nxaxgyfjOxSw6eWkkxe3vHctH1k7OTeKywsqxn8WhjCGiy1knRXioyywp9HkJS2TD6UVZVltvZh5pbA0vwz+TIQ31BN4VjChyWtWDBGJ0NSvI3Ju5qeey0FWR8rfmZUlFZMWC2A3DW2NKOwqsWnzEY3yzlctEaO+fjg9N6X5lyxZhPRL+df+Snd1E09chHWLrIerEW1w1NyQtxaEnjlQfPlocXF2mczJzjfCw8FdOMHtaynKkhemlSoVKqmMHjohOKC1YP4ErkYU9qjZ5C/fZKtet5fc4efCnyWXT5SkX8zm5lOIaA4RhR6MCVu7z0ztVjD1IsaaiMOpL0ZRqGalEMXtP6OP4ak9YCkuXmzNqW6WW90pd4cHCsBYGDE0wzBqhYbhspCzh+TFEQGh7YW2TFHwtaHJP+Axzcv/6usGP3Bw+7/yW5fk5UUqRRWjtXLVlRGJKY1aKySttvSnOmiRLpJLzTGjqb1KQmlUXCromNVGe4FfnqZZXYEacdNazTO1qqW0d1DDW3A7MdFiitWwkAWXgPkc0sUSipYgmYPUxV3c7hmXwJZCDcyPrECKJnhSP0fEU07wgwj8wCIrtd7guikkwSgSqUgsEXBALLFY23gZzUIE4SCRUYpKLGOhPBrQaQ8KNz40FPDuMTZIaFtcQ1A8Akm7C+zsRchEhU6EKCR6vuYBg5XDmwTpcb2bXNb7Pz6mcvHar5nwKMR/lgHmg869vFpECWD4Ly9hL/55/+7N8jFzQx1dTXZwiRAFyZZvs4pylRo1ozHc6qvBzuCclfdxfdPGtw9HD91//AHAugyeIHsncb9RvgERjX1tiBCFjBJVVWnqTh9wDBiY+A3RuxmMJhq4GViaoup5cfT+g4auCMqE00Qksn804x4RIgELiPmGgAVEiAQsIEIkYAERIgELiBAJWECESMCC/wcAAP//XBFHRQAAAAZJREFUAwCMoYZjcDValQAAAABJRU5ErkJggg==",
"text/plain": [
"__start__
]):::first\n", "\tagent(agent)\n", "\ttools(tools)\n", "\t__end__([__end__
]):::last\n", "\t__start__ --> agent;\n", "\tagent -.-> __end__;\n", "\tagent -.-> tools;\n", "\ttools --> agent;\n", "\tclassDef default fill:#f2f0ff,line-height:1.2\n", "\tclassDef first fill-opacity:0\n", "\tclassDef last fill:#bfb6fc\n", "\n" ] } ], "source": [ "from IPython.display import Image, display\n", "\n", "arch = ToolUse(max_rounds=4)\n", "graph = arch.build()\n", "display(Image(graph.get_graph().draw_mermaid_png()))\n", "print(arch.diagram())" ] }, { "cell_type": "markdown", "id": "1a0d25d1", "metadata": { "papermill": { "duration": 0.008397, "end_time": "2026-05-27T03:49:18.317625+00:00", "exception": false, "start_time": "2026-05-27T03:49:18.309228+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 8 · Live run\n", "\n", "Concrete task: time-sensitive research that the model can't possibly have memorized. The question below asks about the *current* state of an open-source project — the model's training cutoff is well in the past, so an honest answer requires real web search and the citation requirement forces grounding." ] }, { "cell_type": "code", "execution_count": 5, "id": "961ce3b8", "metadata": { "execution": { "iopub.execute_input": "2026-05-27T03:49:18.330237Z", "iopub.status.busy": "2026-05-27T03:49:18.325748Z", "iopub.status.idle": "2026-05-27T03:49:52.522374Z", "shell.execute_reply": "2026-05-27T03:49:52.520961Z" }, "papermill": { "duration": 34.220081, "end_time": "2026-05-27T03:49:52.537706+00:00", "exception": false, "start_time": "2026-05-27T03:49:18.317625+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "Final answer ──────────────────────────────────────────────────────────────────────────────────────────────────────\n", "\n" ], "text/plain": [ "\u001b[1;36mFinal answer\u001b[0m \u001b[92m──────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
The latest stable Python release version as of 2026-05-27 is Python 3.14.5. Two user-visible features new in this \n",
"release are compression.zstd and except* statements. \n",
"\n",
"For more information, see the official Python blog at https://blog.python.org/2026/05/python-3145-is-out and the \n",
"Python documentation at https://docs.python.org/3.16/whatsnew/3.16.html. \n",
"\n"
],
"text/plain": [
"The latest stable Python release version as of 2026-05-27 is Python 3.14.5. Two user-visible features new in this \n",
"release are \u001b[1;36;40mcompression.zstd\u001b[0m and \u001b[1;36;40mexcept*\u001b[0m statements. \n",
"\n",
"For more information, see the official Python blog at https://blog.python.org/2026/05/python-3145-is-out and the \n",
"Python documentation at https://docs.python.org/3.16/whatsnew/3.16.html. \n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
"2 tool call(s) · 1 final agent round(s) · tools used: tavily_search ───────────────────────────────────────────\n", "\n" ], "text/plain": [ "\u001b[1;36m2\u001b[0m\u001b[1;36m tool \u001b[0m\u001b[1;36mcall\u001b[0m\u001b[1;36m(\u001b[0m\u001b[1;36ms\u001b[0m\u001b[1;36m)\u001b[0m\u001b[1;36m · \u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;36m final agent \u001b[0m\u001b[1;36mround\u001b[0m\u001b[1;36m(\u001b[0m\u001b[1;36ms\u001b[0m\u001b[1;36m)\u001b[0m\u001b[1;36m · tools used: tavily_search\u001b[0m \u001b[92m───────────────────────────────────────────\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from datetime import date\n", "\n", "TASK = (\n", " f\"As of {date.today().isoformat()}, what is the latest stable Python release \"\n", " f\"version, and name 2 user-visible features new in that release. \"\n", " f\"You MUST cite at least 1 source URL (e.g. python.org or PEP page) — \"\n", " f\"answers without a URL will be considered ungrounded.\"\n", ")\n", "\n", "result = arch.run(TASK)\n", "\n", "print_header(\"Final answer\")\n", "print_md(result.output)\n", "print()\n", "print_header(\n", " f\"{result.metadata['tool_calls']} tool call(s) · \"\n", " f\"{result.metadata['rounds']} final agent round(s) · \"\n", " f\"tools used: {', '.join(result.metadata['tools_used']) or 'none'}\"\n", ")" ] }, { "cell_type": "markdown", "id": "1c59a2a8", "metadata": { "papermill": { "duration": 0.01702, "end_time": "2026-05-27T03:49:52.571820+00:00", "exception": false, "start_time": "2026-05-27T03:49:52.554800+00:00", "status": "completed" }, "tags": [] }, "source": [ "### 8.0 · What just happened, briefly\n", "\n", "Look at the **tool call count** above. Three regimes you might see, each meaningful:\n", "\n", "- **`tool_calls = 0`** — the agent answered from parametric knowledge, ignoring the citation requirement. That's *result drift*, the most dangerous Tool-Use failure mode (the answer *looks* confident but is ungrounded).\n", "- **`tool_calls = 1–3`** — focused use; the agent searched, found enough, answered. This is what we want.\n", "- **`tool_calls ≥ 4`** — *over-search*. The agent kept searching past the point of diminishing returns. Usually a sign that either (a) the model didn't trust the first results, or (b) it ignored its own system-prompt cap.\n", "\n", "§ 9 below will quantify which regime this specific run fell into." ] }, { "cell_type": "markdown", "id": "fdcff209", "metadata": { "papermill": { "duration": 0.007624, "end_time": "2026-05-27T03:49:52.594358+00:00", "exception": false, "start_time": "2026-05-27T03:49:52.586734+00:00", "status": "completed" }, "tags": [] }, "source": [ "### 8.1 · Full trace\n", "\n", "Every event the agent took, in order. `tool_call` events show *what the model asked for*; `tool_result` events show what came back; `agent` events are the model's natural-language outputs (only the final one has no tool calls)." ] }, { "cell_type": "code", "execution_count": 6, "id": "a63ededd", "metadata": { "execution": { "iopub.execute_input": "2026-05-27T03:49:52.642752Z", "iopub.status.busy": "2026-05-27T03:49:52.641722Z", "iopub.status.idle": "2026-05-27T03:49:52.755367Z", "shell.execute_reply": "2026-05-27T03:49:52.751294Z" }, "papermill": { "duration": 0.144899, "end_time": "2026-05-27T03:49:52.757796+00:00", "exception": false, "start_time": "2026-05-27T03:49:52.612897+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
› [1] USER\n", "\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1m]\u001b[0m\u001b[1m USER\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
As of 2026-05-27, what is the latest stable Python release version, and name 2 user-visible features new in that \n", "release. You MUST cite at least 1 source URL (e.g. python.org or PEP page) — answers wi\n", "\n" ], "text/plain": [ "As of \u001b[1;36m2026\u001b[0m-\u001b[1;36m05\u001b[0m-\u001b[1;36m27\u001b[0m, what is the latest stable Python release version, and name \u001b[1;36m2\u001b[0m user-visible features new in that \n", "release. You MUST cite at least \u001b[1;36m1\u001b[0m source URL \u001b[1m(\u001b[0me.g. python.org or PEP page\u001b[1m)\u001b[0m — answers wi\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
› [2] TOOL CALL → tavily_search\n", "\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1m]\u001b[0m\u001b[1m TOOL CALL → tavily_search\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
`latest stable Python release version and new features`\n",
"\n"
],
"text/plain": [
"`latest stable Python release version and new features`\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
"› [3] TOOL RESULT (tavily_search)\n", "\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1m]\u001b[0m\u001b[1m TOOL RESULT \u001b[0m\u001b[1m(\u001b[0m\u001b[1mtavily_search\u001b[0m\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{'error': ValueError('Error 400: When time_range is set, start_date or end_date cannot be set')}...\n", "\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\u001b[32m'error'\u001b[0m: \u001b[1;35mValueError\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'Error 400: When time_range is set, start_date or end_date cannot be set'\u001b[0m\u001b[1m)\u001b[0m\u001b[1m}\u001b[0m\u001b[33m...\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
› [4] TOOL CALL → tavily_search\n", "\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m4\u001b[0m\u001b[1m]\u001b[0m\u001b[1m TOOL CALL → tavily_search\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
`latest stable Python release version and new features`\n",
"\n"
],
"text/plain": [
"`latest stable Python release version and new features`\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
"› [5] TOOL RESULT (tavily_search)\n", "\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m]\u001b[0m\u001b[1m TOOL RESULT \u001b[0m\u001b[1m(\u001b[0m\u001b[1mtavily_search\u001b[0m\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
{\"query\": \"latest stable Python release version and new features\", \"follow_up_questions\": null, \"answer\": null, \n", "\"images\": [], \"results\": [{\"url\": \"https://blog.python.org/2026/05/python-3145-is-out\", \"title\": \"Python 3.14.5 is \n", "out! | Python Insider\", \"content\": \"## Major new features of the 3.14 ser...\n", "\n" ], "text/plain": [ "\u001b[1m{\u001b[0m\u001b[32m\"query\"\u001b[0m: \u001b[32m\"latest stable Python release version and new features\"\u001b[0m, \u001b[32m\"follow_up_questions\"\u001b[0m: null, \u001b[32m\"answer\"\u001b[0m: null, \n", "\u001b[32m\"images\"\u001b[0m: \u001b[1m[\u001b[0m\u001b[1m]\u001b[0m, \u001b[32m\"results\"\u001b[0m: \u001b[1m[\u001b[0m\u001b[1m{\u001b[0m\u001b[32m\"url\"\u001b[0m: \u001b[32m\"https://blog.python.org/2026/05/python-3145-is-out\"\u001b[0m, \u001b[32m\"title\"\u001b[0m: \u001b[32m\"Python 3.14.5 is \u001b[0m\n", "\u001b[32mout! | Python Insider\"\u001b[0m, \u001b[32m\"content\"\u001b[0m: \"## Major new features of the \u001b[1;36m3.14\u001b[0m ser\u001b[33m...\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
› [6] AGENT\n", "\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m6\u001b[0m\u001b[1m]\u001b[0m\u001b[1m AGENT\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
The latest stable Python release version as of 2026-05-27 is Python 3.14.5. Two user-visible features new in this \n", "release are `compression.zstd` and `except*` statements. \n", "\n", "For more information, see the official Python blog at https://blog.python.org/2026/05/python-3145-is-out and the \n", "Python documen\n", "\n" ], "text/plain": [ "The latest stable Python release version as of \u001b[1;36m2026\u001b[0m-\u001b[1;36m05\u001b[0m-\u001b[1;36m27\u001b[0m is Python \u001b[1;36m3.14\u001b[0m.\u001b[1;36m5\u001b[0m. Two user-visible features new in this \n", "release are `compression.zstd` and `except*` statements. \n", "\n", "For more information, see the official Python blog at \u001b[4;94mhttps://blog.python.org/2026/05/python-3145-is-out\u001b[0m and the \n", "Python documen\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "for i, t in enumerate(result.trace, 1):\n", " if t['type'] == 'user':\n", " print_step(f\"[{i}] USER\", t['content'][:200])\n", " elif t['type'] == 'tool_call':\n", " args = t['args'] if isinstance(t['args'], dict) else str(t['args'])\n", " query = args.get('query', args) if isinstance(args, dict) else args\n", " print_step(f\"[{i}] TOOL CALL → {t['tool']}\", f\"`{query}`\")\n", " elif t['type'] == 'tool_result':\n", " snippet = t['content'][:300].replace('\\n', ' ')\n", " print_step(f\"[{i}] TOOL RESULT ({t['tool']})\", snippet + '...')\n", " elif t['type'] == 'agent':\n", " print_step(f\"[{i}] AGENT\", (t.get('content') or '')[:300])\n", " print()" ] }, { "cell_type": "markdown", "id": "ba45894a", "metadata": { "papermill": { "duration": 0.026235, "end_time": "2026-05-27T03:49:52.834361+00:00", "exception": false, "start_time": "2026-05-27T03:49:52.808126+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 9 · What we just observed\n", "\n", "The cells above are live. Below: a quantitative breakdown of the **actual** tool-call sequence the Nebius-hosted Llama-3.3-70B agent produced on this run.\n", "\n", "### 9.1 · Quantitative summary\n", "\n", "| Metric | Value |\n", "|---|---|\n", "| Tool calls made | **2** |\n", "| Tools used | tavily_search |\n", "| Final agent rounds | 1 |\n", "| Final answer length (chars) | 583 |\n", "\n", "**Queries the agent issued to Tavily:**\n", "\n", "| # | Query |\n", "|---|---|\n", "| 1 | `latest stable Python release version and new features` |\n", "| 2 | `latest stable Python release version and new features` |\n", "\n", "\n", "### 9.2 · Pathologies surfaced in this run\n", "\n", "- **Repeated queries.** 1 of the 2 queries were duplicates. The agent has no memory that it already asked. This is a real limitation of Tool Use — ReAct's *thought* step partly fixes it because the model has to justify each search.\n", "\n", "\n", "### 9.3 · The final answer (verbatim)\n", "\n", "> The latest stable Python release version as of 2026-05-27 is Python 3.14.5. Two user-visible features new in this \n", "> release are compression.zstd and except* statements. \n", "> \n", "> For more information, see the official Python blog at https://blog.python.org/2026/05/python-3145-is-out and the \n", "> Python documentation at https://docs.python.org/3.16/whatsnew/3.16.html. \n", "> \n", "> \n", "> \n", "> 2 tool call(s) · 1 final agent…\n", "\n", "### 9.4 · The takeaway\n", "\n", "Tool Use is the **right** pattern when the model needs one or two facts from outside its training data. It's the **wrong** pattern when you need:\n", "\n", "- *Multi-step reasoning* between calls → use **ReAct (nb 03)**.\n", "- *Guaranteed grounding* of the final answer → use **Self-RAG (nb 25)** or **Corrective RAG (nb 24)**.\n", "- *Recovery* from failed tool calls → use **PEV (nb 06)**.\n", "\n", "The pathologies you saw above are not bugs in the implementation — they're inherent to the act-only loop. They motivate the next several notebooks." ] }, { "cell_type": "markdown", "id": "9c3d4894", "metadata": { "papermill": { "duration": 0.021154, "end_time": "2026-05-27T03:49:52.867643+00:00", "exception": false, "start_time": "2026-05-27T03:49:52.846489+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 10 · Try other providers\n", "\n", "Tool Use needs **tool-calling support**. The library's capability matrix (`provider_supports_tools`) gates this — providers without tool-calling (e.g., `huggingface`) will refuse to construct `ToolUse(...)`. Everywhere else, the same notebook runs unchanged." ] }, { "cell_type": "code", "execution_count": 7, "id": "93ea2c93", "metadata": { "execution": { "iopub.execute_input": "2026-05-27T03:49:52.903081Z", "iopub.status.busy": "2026-05-27T03:49:52.903081Z", "iopub.status.idle": "2026-05-27T03:49:52.932647Z", "shell.execute_reply": "2026-05-27T03:49:52.929403Z" }, "papermill": { "duration": 0.047263, "end_time": "2026-05-27T03:49:52.932647+00:00", "exception": false, "start_time": "2026-05-27T03:49:52.885384+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[skip] openai: no API key in .env" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "[skip] anthropic: no API key in .env\n", "[skip] groq: no API key in .env\n" ] } ], "source": [ "from agentic_architectures.llm.factory import provider_supports_tools\n", "\n", "PROVIDERS_TO_TRY = [\"openai\", \"anthropic\", \"groq\"]\n", "for p in PROVIDERS_TO_TRY:\n", " key = settings.api_key_for(p)\n", " if key is None or not key.get_secret_value():\n", " print(f\"[skip] {p}: no API key in .env\")\n", " continue\n", " if not provider_supports_tools(p):\n", " print(f\"[skip] {p}: provider does not advertise tool-calling\")\n", " continue\n", "\n", " print_header(f\"Re-running Tool Use on {p}\")\n", " other_llm = get_llm(provider=p)\n", " other_arch = ToolUse(llm=other_llm, max_rounds=2)\n", " r = other_arch.run(\"What is the current price (USD) of one Bitcoin? Cite the source URL.\")\n", " print(r.output[:400])\n", " print(f\" tool_calls: {r.metadata['tool_calls']}, rounds: {r.metadata['rounds']}\")\n", " print()" ] }, { "cell_type": "markdown", "id": "ee3c71a0", "metadata": { "papermill": { "duration": 0.019857, "end_time": "2026-05-27T03:49:52.979522+00:00", "exception": false, "start_time": "2026-05-27T03:49:52.959665+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 11 · Failure modes, safety, extensions\n", "\n", "### 11.1 · Where this breaks\n", "\n", "| Failure | Mechanism | Mitigation |\n", "|---|---|---|\n", "| **Over-search** | Model keeps calling tools even when it has enough info | System prompt cap (we use this) + `max_rounds` bound |\n", "| **Result drift / hallucination** | Tool result is in context but model answers from parametric knowledge anyway | Force grounding in prompt; switch to **Self-RAG (nb 25)** or **Corrective RAG (nb 24)** |\n", "| **Vague queries** | Model issues queries too generic to retrieve anything useful | Add a thought step → upgrade to **ReAct (nb 03)** |\n", "| **Repeated identical queries** | Agent forgets it already asked and re-asks | Maintain a dedup cache; or use a planner (nb 04) |\n", "| **Tool execution errors** | Tool raises (network down, rate limit) and agent doesn't recover | The library's `web_search_tool` wraps with `tenacity` exponential backoff; for tools you write yourself, do the same |\n", "| **Prompt injection through tool output** | Tool result contains adversarial text that hijacks the agent | Treat tool output as untrusted; sanitize / quote when re-prompting |\n", "\n", "### 11.2 · Production safety\n", "\n", "- **Cap rounds + recursion limit.** Both are configured by default in this library; never remove them.\n", "- **Whitelist tools.** Don't bind every tool you have — give the agent only the tools relevant to the task. Each extra tool widens the prompt-injection surface.\n", "- **Don't let tool output flow to the user verbatim.** A user-asked-for \"summary\" should still be model-generated, not pasted from a search result the user can't see.\n", "- **Add a per-tool timeout.** A hung tool with no timeout will deadlock the whole graph.\n", "\n", "### 11.3 · Three extensions to try\n", "\n", "1. **Add a Python REPL tool.** Use `agentic_architectures.tools.code_exec.python_repl_tool` to give the agent arithmetic / data-manipulation power. Useful for \"calculate this from the search results\" tasks.\n", "2. **Multi-tool agent.** Bind both `web_search_tool` and a domain-specific tool (e.g., a SQL query tool). Watch how the agent picks between them. This is the path toward **Meta-Controller (nb 11)**.\n", "3. **Swap to ReAct (nb 03).** Same task, with explicit *thought* before each action. You'll see the agent's queries get more specific because it has to write a justification first.\n", "\n", "### 11.4 · What to read next\n", "\n", "- [**03 · ReAct**](./03_react.ipynb) — Tool Use + an explicit reasoning step. The natural next stop.\n", "- [**04 · Planning**](./04_planning.ipynb) — when the task is big enough that planning ahead beats reacting.\n", "- [**06 · PEV**](./06_pev.ipynb) — Tool Use + an automatic verifier that catches bad tool outcomes.\n", "- [**23 · Agentic RAG**](./23_agentic_rag.ipynb) — Tool Use where the tool is a vector retriever and the agent decides when to retrieve.\n", "\n", "### 11.5 · References\n", "\n", "1. Schick, T. et al. *Toolformer: Language Models Can Teach Themselves to Use Tools.* 2023. [arXiv:2302.04761](https://arxiv.org/abs/2302.04761)\n", "2. OpenAI. *Function calling and other API updates.* June 2023. [openai.com/blog/function-calling-and-other-api-updates](https://openai.com/blog/function-calling-and-other-api-updates)\n", "3. LangGraph `ToolNode` & `tools_condition` — [official prebuilts docs](https://langchain-ai.github.io/langgraph/reference/prebuilt/)\n", "4. Tavily search API — [tavily.com](https://tavily.com)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" }, "papermill": { "default_parameters": {}, "duration": 42.252407, "end_time": "2026-05-27T03:49:54.324863+00:00", "environment_variables": {}, "exception": null, "input_path": "all-agentic-architectures/notebooks/02_tool_use.ipynb", "output_path": "all-agentic-architectures/notebooks/02_tool_use.ipynb", "parameters": {}, "start_time": "2026-05-27T03:49:12.072456+00:00", "version": "2.7.0" } }, "nbformat": 4, "nbformat_minor": 5 }