Chris Martin - Agents, tools, and security

Matt Webb recently shared an approach to controlling smart home infrastructure with language models—a step towards his ultimate goal of creating “a new operating system for physical space”

I spent Friday night and Saturday at the London AI Hackathon… I buddied up with old colleague Campbell Orme and together we built Lares: a simulation of a smart home, with working code for an generative-AI-powered assistant.

[…]

It’s using the ReAct pattern, which is straightforward and surprisingly effective… This pattern gets the AI to respond by making statements in a Thought/Action/PAUSE/Observation loop

[…]

Generally with the ReAct pattern the tools made available to the AI allow it to query Google, or look up an article in Wikipedia, or do a calculation… For Lares we made the smart home into a tool. We said: hey here are the rooms, here are the devices, and here are their commands, do what you want.

After a certain point, especially once you give an AI agent the ability to act on your behalf—turn on and off your lights, send emails as you, lock and unlock the doors to your house…—security vulnerabilities start to become a serious concern.

In a recent blog post, Simon Willison proposed a potential solution to prompt injection attacks. He suggests filtering all user requests through a bespoke “security” LLM before sending it off to a more powerful “agent” LLM:

I think we need a pair of LLM instances that can work together: a Privileged LLM and a Quarantined LLM.

The Privileged LLM is the core of the AI assistant. It accepts input from trusted sources—primarily the user themselves—and acts on that input in various ways.

It has access to tools: if you ask it to send an email, or add things to your calendar, or perform any other potentially destructive state-changing operation it will be able to do so, using an implementation of the ReAct pattern or similar.

The Quarantined LLM is used any time we need to work with untrusted content—content that might conceivably incorporate a prompt injection attack. It does not have access to tools, and is expected to have the potential to go rogue at any moment.

It has become increasingly clear that the process of creating robust systems that incorporate language models is going to look very similar to “traditional" programming. Sure, it might be an extremely “high level” programming language but it still carries many of the existing complexities that have always been present.