Chris Martin - Watermarking GPT

Scott Aaronson, in a lecture at the University of Texas at Austin, describes a project he has been working on at OpenAI to watermark GPT output:

My main project so far has been a tool for statistically watermarking the outputs of a text model like GPT. Basically, whenever GPT generates some long text, we want there to be an otherwise unnoticeable secret signal in its choices of words, which you can use to prove later that, yes, this came from GPT.

At its core, GPT is constantly generating a probability distribution over the next token to generate… the OpenAI server then actually samples a token according to that distribution—or some modified version of the distribution, depending on a parameter called “temperature.” As long as the temperature is nonzero, though, there will usually be some randomness in the choice of the next token

So then to watermark, instead of selecting the next token randomly, the idea will be to select it pseudorandomly, using a cryptographic pseudorandom function, whose key is known only to OpenAI.

In my first two posts on the future of AI in education I highlighted proposals that frame generative AI use as an inevitability and, therefore, a tool to be embraced instead of a threat that warrants the development of technical detection mechanisms. While the risks of plagiarism and misinformation are undoubtedly real, we should push for a greater focus on strengthening critical analysis and fact-checking skills instead of starting an impossible to win arms race between detection and evasion.

The most exciting path forward is one where we frame Large Language Models as “a calculator for text”. Just as the invention of pocket calculators was a giant disruption that forced us to re-evaluate our approach to mathematics education, language models will continue to force us to re-evaluate our approach to research and writing. Done correctly this will open the door for us to learn more quickly, use our time more effectively, and progress further than we possibly could before.