Week 5: Tools & Function Calling - 1. Giving LLMs Hands

Tool calling lets a model request app code, receive results, and then answer the user.

Before You Read

Explain tool calling as a contract between the model and application code.
Separate model decision-making from deterministic tool execution.
Identify tasks that require tools instead of pure text generation.

Working Model

A tool-using model does not directly perform side effects. It proposes a structured tool call, your application validates and executes it, then the model uses the returned result to continue or answer.

By default, Large Language Models are "brains in a jar." They are trapped in a text box. They can generate brilliant poetry, write complex code, and reason through difficult problems, but they cannot do anything. They cannot check the current weather, query your production database, send an email, or book a flight.

To build true AI Agents, we need to give LLMs "hands." We do this through a mechanism called Function Calling (or Tool Use).

The Illusion of Execution

A common misconception is that when you give an LLM a tool, the LLM actually runs the code. This is false.

LLMs are text prediction engines; they cannot execute Python scripts or make HTTP requests. Instead, Function Calling is a clever protocol:

You tell the LLM what tools (functions) you have available in your application.
If the user asks a question that requires a tool, the LLM stops generating normal text.
Instead, it generates a structured JSON object containing the name of the tool to use and the arguments to pass to it.
Your application intercepts this JSON, runs the actual code locally, and hands the result back to the LLM.

The Paradigm Shift

Introduced natively in major model APIs in 2023, Function Calling fundamentally changed how we build AI applications.

Before Function Calling, developers had to use complex "prompt engineering" hacks (like the ReAct framework) to coax the LLM into outputting a specific string like Action: Search, Input: Weather in Tokyo, and then use fragile Regular Expressions to parse that string.

Now, models can be given explicit tool definitions and can return structured tool-call arguments matching a schema you define. Strict schema modes make this much more reliable, but your application should still validate every argument before executing code. This makes robust AI systems possible without pretending the model itself is deterministic.