Chris Martin

The Pudding design guide

May 30, 2023

Ilia Blinderman, from the excellent visual journalism publication The Pudding, shares some of what he has learned about how to create well designed, data-driven essays:

We have a curious tendency of assuming that people who can do certain things that we cannot are imbued with superior innate talents… This may be especially common for the sort of code-driven interactive data visualizations which we work on, since they rely on an odd grab-bag of skills —critical thought, design, writing, and programming — that people in many other professions may have neither a full awareness of, nor full expertise in.

[…]

I’m hoping that my putting this guide together will help remove some of the unnecessary mystique surrounding data viz, and demonstrate that the only things that separate a beginner from a speaker on the conference circuit [are] time and practice.

I recently wrote about how I am currently wrapping up a big data visualization project with my students—I wish I had known about this resource earlier!
Tree-of-thought

May 29, 2023

Jieyi Long:

In this paper, we introduce the Tree-of-Thought (ToT) framework, a novel approach aimed at improving the problem-solving capabilities of auto-regressive large language models (LLMs). The ToT technique is inspired by the human mind’s approach for solving complex reasoning tasks through trial and error.

Here is the problem: LLMs do not know whether the answer they are currently generating is accurate or optimal. Once they start down a particular path, they are locked in, unable to reconsider unless they are later prompted to.

Language models do not explicitly perform logical correctness checks as it generates a new token based on the previous tokens. This limits the model’s capacity to rectify its own mistakes. A minor error could be amplified as the model generates more tokens

Tree-of-thought lets the model explore multiple solutions, backtracking when a particular solution is deemed to be suboptimal. Compared to previous “chain-of-thought” prompting techniques, tree-of-thought gives the LLM more computation time before arriving at a final conclusion.

As mentioned above, LLMs typically generate a token based on the preceding sequence of tokens without backward editing. On the contrary, when a human solver attempts to solve a problem, she might backtrack to previous steps if a derivation step is incorrect, or if she becomes stuck and is unable to make further progress towards arriving at the final answer.

[…]

[The tree-of-thought framework] incorporates several components which enhance the problem solving capability of the LLM, including a prompter agent, a checker module, a memory module, and a ToT controller.

It is fascinating to think about what studying language models can teach us about our own cognition.

Related: Loom—A “multiversal tree writing interface for human-AI collaboration”
A new microphone

May 26, 2023

Last month, when the viral AI-generated “Drake” song was blowing up, the musician Grimes told fans that she would split royalties “on any successful AI generated song that uses my voice.”

I am honestly surprised by the extent that she followed through with this.

Grimes subsequently released software designed to assist in the generation of these songs—“If you go to elf.tech u can upload ur voice singing or record directly into the app… It will output the same audio but with my voice.”

She recently spoke with Joe Coscarelli at The New York Times about some of the music that has been produced so far.

Grimes:

People keep getting really upset, being like, “I want to hear something that a human made!” And I’m like, humans made all of this. You still have to write the song, produce the song and sing the vocal. The part that is A.I. is taking the harmonics and the timbre of the vocal and moving them to be consistent with my voice, as opposed to the person’s original voice. It’s like a new microphone.
OpenAI and regulatory capture

May 25, 2023

Sam Altman has been spending the past few weeks advocating in favor of AI regulations.

OpenAI:

In terms of both potential upsides and downsides, superintelligence will be more powerful than other technologies humanity has had to contend with in the past. We can have a dramatically more prosperous future; but we have to manage risk to get there.

[…]

We are likely to eventually need something like an IAEA for superintelligence efforts; any effort above a certain capability (or resources like compute) threshold will need to be subject to an international authority that can inspect systems, require audits, test for compliance with safety standards, place restrictions on degrees of deployment and levels of security, etc.

To be fair, they say open source models are totally fine… as long as they don’t get too good:

We think it’s important to allow companies and open-source projects to develop models below a significant capability threshold, without the kind of regulation we describe here… the systems we are concerned about will have power beyond any technology yet created, and we should be careful not to water down the focus on them by applying similar standards to technology far below this bar.

Last week, Altman was in Washington DC discussing these topics with lawmakers.

Cat Zakrzewski, The Washington Post:

OpenAI chief executive Sam Altman delivered a sobering account of ways artificial intelligence could “cause significant harm to the world” during his first congressional testimony, expressing a willingness to work with nervous lawmakers to address the risks presented by his company’s ChatGPT and other AI tools.

Altman advocated a number of regulations — including a new government agency charged with creating standards for the field — to address mounting concerns that generative AI could distort reality and create unprecedented safety hazards.

In October, 2022 Sam Bankman Fried published a “draft of a set of standards” for the cryptocurrency industry. He had previously been spearheading the effort to lobby congress to adopt similar regulatory measures industry-wide.

Less than one month later, his exchange, FTX declared bankruptcy. He was subsequently indicted with “wire fraud, conspiracy to commit commodities fraud, conspiracy to commit securities fraud and conspiracy to commit money laundering” among other charges.

Mike Masnick, Techdirt:

It’s actually kind of typical: when companies get big enough and fear newer upstart competition, they’re frequently quite receptive to regulations… established companies often want those regulations in order to lock themselves in as the dominant players, and to saddle the smaller companies with impossible to meet compliance costs.

OpenAI should be commended for kickstarting our current generative AI development explosion and they are still, without question, the leader in this space.

This move should be called out for what it is, though—a blatant ploy for regulatory capture.
A long, red night

May 23, 2023

The anonymous writer behind the brr.fyi blog has been writing about their experience living and working at an Antarctic research station.

On May 12th, the sun passed below the horizon. It will be dark for the next eight months. From now until August, there will be no visitors and no resupply trips—during the long arctic winter the environment is too harsh for safe air travel. Everyone is truly isolated.

Oh, and when you decide to venture outside, everything is red:

There are a number of science projects that can only happen during the winter here, because of the unique environmental characteristics of the South Pole (darkness, elevation, weather, air quality, etc). A few of these are extremely sensitive to broad-spectrum visible light.

[…]

To protect the science experiments, we work hard to avoid any stray broad-spectrum light outside. This means all our station windows are covered, all our exterior navigation lights are tinted red, and we’re only permitted to use red headlamps and flashlights while walking outside.

Once it becomes closer to fully dark, these lights take on a surreal quality.

Make sure you check out the photos and videos the author shared. Surreal doesn’t even begin to describe it.
AI in the classroom (V)

May 22, 2023
The New York City Public School system has decided to reverse its ban on ChatGPT.

David C. Banks, Chancellor of the New York City Department of Education:

In November, OpenAI introduced ChatGPT to the public, unleashing the power of generative artificial intelligence and other programs that use vast data sets to generate new and original content. Due to potential misuse and concerns raised by educators in our schools, ChatGPT was soon placed on New York City Public Schools’ list of restricted websites.

[…]

The knee-jerk fear and risk overlooked the potential of generative AI to support students and teachers, as well as the reality that our students are participating in and will work in a world where understanding generative AI is crucial.

[…]

While initial caution was justified, it has now evolved into an exploration and careful examination of this new technology’s power and risks.

New York City Public Schools will encourage and support our educators and students as they learn about and explore this game-changing technology while also creating a repository and community to share their findings across our schools. Furthermore, we are providing educators with resources and real-life examples of successful AI implementation in schools to improve administrative tasks, communication, and teaching.

Anecdotally, I have heard Snapchat’s My AI feature was a turning point. After a rocky start, it brought generative AI to an app nearly every teenager already has installed on their smartphone.

Even without Snapchat, Google and Microsoft are currently in the process of integrating generative AI into Docs and Word. Once these features fully roll out, a probation on language models will lead to students cheating without ever intending to. Imagine failing a student for using autocorrect.

The cat is out of the bag.

Previously:
AMP

May 19, 2023

David Pierce tells the story of Google’s AMP initiative:

After a decade of newspapers disappearing, magazine circulations shrinking, and websites’ business dwindling, the media industry had become resigned to its own powerlessness. Even the most cynical publishers had grown used to playing whatever games platforms like Google and Facebook demanded in a quest for traffic…

“If Google said, ‘you must have your homepage colored bright pink on Tuesdays to be the result in Google,’ everybody would do it, because that’s what they need to do to survive,” says Terence Eden, a web standards expert and a former member of the Google AMP Advisory Committee.

[…]

AMP succeeded spectacularly. Then it failed. And to anyone looking for a reason not to trust the biggest company on the internet, AMP’s story contains all the evidence you’ll ever need.

It seems important that this reporting is coming from The Verge, a publication that was dramatically redesigned last year to shift focus away from external platforms.

Introducing the redesign, editor-in-chief Nilay Patel wrote:

Our goal in redesigning The Verge was actually to redesign the relationship we have with you, our beloved audience. Six years ago, we developed a design system that was meant to confidently travel across platforms as the media unbundled itself into article pages individually distributed by social media and search algorithms.

[…]

But publishing across other people’s platforms can only take you so far. And the more we lived with that decision, the more we felt strongly that our own platform should be an antidote to algorithmic news feeds, an editorial product made by actual people with intent and expertise

As an aside, when Safari Web Extensions for mobile launched with iOS 15, various “AMP blocker” utilities immediately became hugely popular. I have had one installed ever since. If you don’t, do yourself a favor.
Alien logic

May 18, 2023

Chris Wiley, The New Yorker:

The artist Charlie Engman is one of the few photographers who have leaned into the alien logic of the new machine age and found a way to make something that feels new.

[…]

Engman as an A.I. artist is dizzyingly prolific. “The amazing thing about A.I. is that I can make, like, three hundred pictures a day,” he told me, “And every single one of them can be an entirely different set of characters, and new location, and new material. I’m not constrained by physical reality at all.”

We need a better way to describe AI artwork than “photography.” It is a collage where each element is, at the same time, bespoke and not under your control. It is a new medium. Our conversations will improve once we acknowledge it as such.

In this context, it is clear that the artistic tools used to create within this new medium are painfully immature. Engman’s “characters” are fleeting apparitions. Any minute tweak to your seed number, token choice, or model architecture, and they are gone forever.

We should be able to recompose a frame, resituate characters in a different environment, give them a haircut, and change their wardrobe, pose, and expression. Characters should have a history. Maybe one is afraid of heights. That should be remembered next time you pose them crossing a mountain overpass. The ways that characters relate to each other and their environment should have a consistent logic while continuing to allow for surprising choices on the part of the model.

Technologies such as ControlNet could begin to help here, but they are still early and only accessible to those with a technical background.

Or maybe this is all the wrong attitude to take. Maybe treating AI artwork as a new medium means embracing a certain loss of control.

It is too early to say.
Personal voice

May 17, 2023

To celebrate Global Accessibility Awareness Day, Apple announced a handful of forthcoming iOS features. One of these features really caught my eye because I think it gives us a tiny glimpse at what Apple’s platforms might look like in the future:

Coming later this year… those at risk of losing their ability to speak can use Personal Voice to create a synthesized voice that sounds like them for connecting with family and friends.

With Live Speech on iPhone, iPad, and Mac, users can type what they want to say to have it be spoken out loud during phone and FaceTime calls as well as in-person conversations… For users at risk of losing their ability to speak — such as those with a recent diagnosis of ALS… Personal Voice is a simple and secure way to create a voice that sounds like them.

Users can create a Personal Voice by reading along with a randomized set of text prompts to record 15 minutes of audio on iPhone or iPad. This speech accessibility feature uses on-device machine learning to keep users’ information private and secure, and integrates seamlessly with Live Speech so users can speak with their Personal Voice when connecting with loved ones.

Apple’s custom silicon expertise currently gives them a huge advantage when it comes to training and running personalized machine learning models locally on customer’s devices.

I would love to see a “Siri 2.0” that utilizes on-device language models. However, as we get closer to WWDC it has gotten increasingly clear that this is not the year for that. This year will almost certainly be dominated by an unveiling of their rumored XR headset. But even setting the headset aside, Apple tends to be very slow and methodical when it comes to large changes like the ones that would be required by a huge Siri revision.

Nevertheless, I expect to see more incremental progress towards expanding on-device AI models throughout all of Apple’s platforms. We just might have to wait until iOS 18 or 19 for the big stuff.
Ground zero

May 16, 2023

Douglas, writing for the blog A Mindful Monkey, shares some observations on the task of raising a newborn:

She is very curious

I already pretty much know how the world works. I don’t need to create mental models of what happens when a rubber duck gets put in a cup and taken out x 100 times like she does. I don’t have a desire to poke my finger into the ethernet port. I can’t imagine having to start from ground zero and develop an understanding of everything, but I guess I did it once and so does she.

[…]

Within the last month, she seems to have created the concept of ‘handle’ in her mind.

Instead of grabbing objects wherever her hand lands, she reaches for parts that are best fit to grab; even if the object is new to her. The concept of a handle seems pretty innate to me and I didn’t notice it until watching my daughter. It’s just instinctual that things have handles (or at least the best ways to grab them). It’s like she has developed a sense of physics (gravity, torque, etc) without actually knowing what those this are at all.

One more, from a follow up post:

One time I laid down to see what her mobile looked like from her perspective when she was an infant. It looked drastically less cool from her vantage point. She has since grown and is now at knee height. I wonder about how different every room feels from that height. The kitchen is towering over you, with no idea of what is going on up there. A refrigerator so tall you can only see the bottom row of food. You can never open a door.
Claude 100K

May 15, 2023

Just last Monday I was commending Mosaic for their “StoryWriter” language model with its huge 65,000 token context window. In contrast, OpenAI’s forthcoming 32,000 token large context version of GPT-4 no longer looks so impressive.

Well…

Anthropic:

We’ve expanded Claude’s context window from 9K to 100K tokens… The average person can read 100,000 tokens of text in ~5+ hours, and then they might need substantially longer to digest, remember, and analyze that information. Claude can now do this in less than a minute.

[…]

Beyond just reading long texts, Claude can help retrieve information from the documents that help your business run. You can drop multiple documents or even a book into the prompt and then ask Claude questions that require synthesis of knowledge across many parts of the text.

With Google’s announcements last week and Anthropic’s steady stream of improvements, I am beginning to wonder what OpenAI has up its sleeve.
Early computer art

May 12, 2023

Amy Goodchild published a comprehensive primer on the emergence of computer art in the 1950s and 60s. It is inspiring to see the depth of what artists were able to create using primitive, inaccessible, and expensive technologies compared to what is available today.

Artists have a long history of pushing the boundaries of what is possible. In recent years, the space of possibilities has expanded dramatically. I hope there are creative pioneers working today to push us even further.
MLC LLM and local assistants

May 10, 2023

The Machine Learning Compilation blog:

Significant progress has been made in the field of generative artificial intelligence and large language models… As it stands, the majority of these models necessitate the deployment of powerful servers to accommodate their extensive computational, memory, and hardware acceleration requirements.

[…]

MLC-LLM [is] a universal solution that takes the ML compilation approach and brings LLMs onto diverse set of consumer devices… To make our final model accelerated and broadly accessible, the solution maps the LLM models to vulkan API and metal, which covers the majority of consumer platforms including windows, linux and macOS… Finally, thanks to WebGPU, we can offload those language models directly onto web browsers. WebLLM is a companion project that leverages the ML compilation to bring these models onto browsers.

Their iOS app is powered by the Vicuna 7B language model. I was genuinely shocked by the inference speed on my iPhone 14 Pro. The response quality is roughly equivalent to MPT, StableLLM, and other similar open source projects—in other words, not particularly great. But, again, all of this is running locally on a phone—that is a truly impressive feat.

One of the example use-cases from the linked announcement is a bespoke AI assistant that is trained on each individual user’s private data. Now, this personalized assistant should run locally for privacy and security reasons but it doesn’t have to be particularly powerful as long as it has the ability to offload difficult tasks to a more powerful, centralized assistant in a privacy preserving manner.

A pattern very similar to Simon Willison’s recent proposal for “Privileged” and “Quarantined” LLMs would be key here.

In this scenario, it is less important for local models to be powerful than it is for them to be fast and energy efficient. MLC could be a step towards making this a reality.
Dribbling droids

May 9, 2023

AI researchers from Google’s DeepMind team used reinforcement learning to teach simulated robots how to play soccer. They then managed to transfer those learned abilities onto real, physical robots. We are left with not only impressive research, but a bunch of videos of super cute, tottering robots playing soccer with each other.

DeepMind:

Our agents, with 20 actuated joints, were trained in simulation using the MuJoCo physics engine, and transferred zero-shot to real robots… The trained soccer players exhibit robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more… The agents also developed a basic strategic understanding of the game, learning to anticipate ball movements and to block opponent shots.

As I mentioned above, the linked page is full of videos that are all fascinating and impressive; if nothing else, don’t miss the clip where a researcher repeatedly pushes down an adorable, tiny, humanoid robot as it runs around playing with a soccer ball. Her discomfort with each push is palpable.
MPT-7B

May 8, 2023

While I wish development efforts would coalesce around a single, capable, open source language model, it is undoubtedly interesting to see the variety of new entrants in this space.

Mosaic:

Introducing MPT-7B, the latest entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B… we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+

The base model, the instruction-tuned model, and the 65k model are all commercially licensed. The chat model has a Creative Commons non-commercial license because it was finetuned on data from OpenAI.

I tried out the chat model on Hugging Face. At first glance, it seems comparable to other open source models I have previously used.

The most surprising announcement here is certainly StoryWriter-65k. The (still unreleased) large context-length version of GPT-4 will be able to handle 32,000 tokens—less than half of what is possible here.
We have no moat

May 5, 2023

Yesterday, SemiAnalysis shared a leaked document that purports to be an internal memo written by an engineer at Google:

We’ve done a lot of looking over our shoulders at OpenAI. Who will cross the next milestone? What will the next move be?

But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch.

I’m talking, of course, about open source. Plainly put, they are lapping us… While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.

The anonymous engineer attributes many of these open source developments to the leak of Meta’s LLaMA model:

At the beginning of March the open source community got their hands on their first really capable foundation model, as Meta’s LLaMA was leaked to the public. It had no instruction or conversation tuning, and no RLHF. Nonetheless, the community immediately understood the significance of what they had been given.

A tremendous outpouring of innovation followed, with just days between major developments… Here we are, barely a month later, and there are variants with instruction tuning, quantization, quality improvements, human evals, multimodality, RLHF, etc. etc. many of which build on each other.

Indeed, while open source language models still lag behind state of the art closed models, the speed of development is unparalleled. The quality gap has already closed with text-to-image models and there is no reason to think the same won’t happen with LLMs.

To be clear, I think centralized models will remain important, if only for thin clients where compute power is limited. Who knows, though, it is always possible that open source development ends up driving a commoditization of language models where there is no reason to call OpenAI’s API over a random AWS endpoint.

The author concludes that, for this reason, Google should contribute to the open source community instead of attempting to compete with it:

This recent progress has direct, immediate implications for our business strategy. Who would pay for a Google product with usage restrictions if there is a free, high quality alternative without them?

[…]

The more tightly we control our models, the more attractive we make open alternatives. Google and OpenAI have both gravitated defensively toward release patterns that allow them to retain tight control over how their models are used. But this control is a fiction. Anyone seeking to use LLMs for unsanctioned purposes can simply take their pick of the freely available models.

Google should establish itself a leader in the open source community, taking the lead by cooperating with, rather than ignoring, the broader conversation. This probably means taking some uncomfortable steps, like publishing the model weights for small ULM variants. This necessarily means relinquishing some control over our models. But this compromise is inevitable. We cannot hope to both drive innovation and control it.

A pivot towards open source is a great strategy that would clearly differentiate Google from OpenAI at a time when Google sorely needs it. But remember, this memo was written by a Google engineer, not someone from Google’s leadership—the higher-ups at Google appear to be moving in the opposite direction.

Nitasha Tiku, Washington Post:

In February, Jeff Dean, Google’s longtime head of artificial intelligence, announced a stunning policy shift to his staff: They had to hold off sharing their work with the outside world.

For years Dean had run his department like a university, encouraging researchers to publish academic papers prolifically; they pushed out nearly 500 studies since 2019

[…]

Things had to change. Google would take advantage of its own AI discoveries, sharing papers only after the lab work had been turned into products, Dean said, according to two people with knowledge of the meeting

If leadership wants to be proactive, the clock is ticking. During Meta’s most recent earnings call, Mark Zuckerberg made it clear that they intend to embrace an open source approach to AI moving forward.

Mark:

Right now most of the companies that are training large language models have business models that lead them to a closed approach to development. I think there’s an important opportunity to help create an open ecosystem. If we can help be a part of this, then much of the industry will standardize on using these open tools and help improve them further.

[…]

I mentioned LLaMA before and I also want to be clear that while I’m talking about helping contribute to an open ecosystem, LLaMA is a model that we only really made available to researchers and there’s a lot of really good stuff that’s happening there. But a lot of the work that we’re doing, I think, we would aspire to and hope to make even more open than that. So, we’ll need to figure out a way to do that.

If Google remains in stasis for much longer, an open source first philosophy would no longer be unique—it would look like Google decided to start copying Meta instead of OpenAI.
Stopped in its tracks

May 4, 2023

Topher Sanders and Dan Schwartz, ProPublica:

Every day across America, [trains] park in the middle of neighborhoods and major intersections, waiting to enter congested rail yards or for one crew to switch with another. They block crossings, sometimes for hours or days, disrupting life and endangering lives.

[…]

In Hammond [Illinois], the hulking trains of Norfolk Southern regularly force parents, kids and caretakers into an exhausting gamble: How much should they risk to get to school?

For their part, the Norfolk Southern executives sound like lovely people:

[The Hammond school district] has asked Norfolk Southern for its schedule so that the schools can plan for blockages and students can adjust their routines. The company has disregarded the requests, school officials said.

Mayor Thomas McDermott Jr. said that his experience with the rails has been similar, and that company officials have reminded him the rails “were here first,” running through Hammond before it was even a city.

Click through to the article for some of the most powerful and nerve-racking photojournalism I have ever seen.

Less than a week after publishing the above investigation, Sanders and Schwartz wrote a brief follow-up:

Within 48 hours of an investigation about children having to crawl under parked trains to get to school in an Indiana suburb, residents packed a public meeting to demand solutions, the Federal Railroad Administration issued a safety advisory, a bipartisan group of Indiana lawmakers sent a letter to the U.S. Department of Transportation pleading for change and Norfolk Southern’s CEO, Alan Shaw, got involved.

[…]

The day after the story was published, [Mayor Thomas McDermott Jr.] got a call from [Norfolk Southern’s CEO, Alan Shaw], who told him he was shocked by the situation in Hammond and wanted to help fix it. “I don’t want to divulge too much about what we talked about, but if it works out the way I hope it does, it will be spectacular,” the mayor said.
Pi

May 3, 2023

Inflection is a new AI startup run by Mustafa Suleyman, Karén Simonyan, and Reid Hoffman. Their new language model, Pi, isn’t designed to be the most technically advanced AI on the market. The goal is, instead, to create the most friendly and conversational bot available—“more like a sounding board than a repackaged Wikipedia answer.”

Alex Konrad, Forbes:

Named Pi for “personal intelligence,” Inflection’s first widely released product — made available today… is supposed to play the active listener, helping users talk through questions or problems over back-and-forth dialog it then remembers, seemingly getting to know its user over time. While it can give fact-based answers, it’s more personal than OpenAI’s GPT-4… without the virtual companionship veering into unhealthy parasocial relationships reported by some users of Replika bots.

I tried the Pi iOS app. From my brief interactions so far, the AI seems fine. My biggest critique would be that it appears to be both a bit bland and quite repetitive at the moment—one of its first responses included the phrase “are you pulling my non-existent leg?” that was also present in the linked Forbes article.

I find the overall idea of a personable LLM fascinating, though. I was immediately reminded of a recent Stratechery interview where Daniel Gross made the case for a “funny” language model:

…The sad thing to me, and actually the really alarming thing to me, is not the capability of the models or whether it’s connected to the Internet or not. To me, it’s the fact that… no one has really spent time making them sort of wonderful and fun in a Pixar way. We don’t have a John Lasseter or a Walt Disney who’s really focused on the technology but also the enjoyment of the model.

[…]

I think we’re missing… people that can really think deeply about how to make a very funny LLM. I’ve been shouting at Nat and anyone else who will listen to me that we need to find someone making a really funny language model… there are many more papers about LLMs doing math than LLMs being funny. But I think actually being funny is much more important and I would argue, broadly, a very important direction, if you think about broader AI safety risk and all that sort of stuff, it should feel like as if we’re creating the world’s best pet, not the world’s smartest actuary.

I do not think Inflection has achieved Daniel’s goal yet but I am glad someone is out there trying.

After all, we rarely view our personal relationships as a meritocracy—we value kindness, empathy, and humor over raw intelligence. Going back to technology, Apple is one of the biggest companies in the world. Is it because they sell the most cutting edge technology or deliver the greatest price-performance ratio? Absolutely not—Apple is so successful because they understand design is more important than pure technical benchmarks.

The Pi app is wonderfully well designed, now they just need to continue working on the model.
Socra-GPT

May 2, 2023

Researchers at MIT and Columbia University recently published a study measuring whether using an AI language model that answers questions using the Socratic method results in higher levels of critical thinking and metacognition compared to standard language model interactions.

I tested something similar after OpenAI used a “Socratic tutor” prompt to demonstrate the “steerability” of GPT-4. I found the experience to be surprisingly transformative compared to the standard ChatGPT “Q&A” workflow.

Valdemar Danry, Pat Pataranutaporn, Yaoli Mao, and Pattie Maes:

AI models can be biased, deceptive, or appear more reliable than they are, leading to dangerous decision-making outcomes… This is especially concerning when AI systems are used in conjunction with humans, as people have a tendency to blindly follow the AI decisions and stop using their own cognitive resources to think critically.

[…]

This paper presents the novel idea of AI-framed Questioning inspired by the ancient method of Socratic questioning that uses intelligently formed questions to provoke human reasoning, allowing the user to correctly discern the logical validity of the information for themselves. In contrast to causal AI-explanations that are declarative and have users passively receiving feedback from AI systems, our AI-framed Questioning method provides users with a more neutral scaffolding that leads users to actively think critically about information.

[…]

Our results show that AI-framed Questioning increase the discernment accuracy for flawed statements significantly over both control and causal AI-explanations of an always correct AI system.

Assuming the results here are accurate, what are the implications for traditional pedagogy?
The age of mechanical reproduction

May 1, 2023

James Reeves:

The other day I was sleepy and, after a long day of grading, I thought I was reading an especially uninspired essay about Walter Benjamin’s “Work of Art in the Age of Mechanical Reproduction,” and I responded with a few hundred words of feedback and questions. Then it hit me. With gritted teeth, I pasted the essay into a text box, and yep, three of the algorithms that check the other algorithms delivered a 92% result.

And it was about the goddamned work of art in the goddamned age of mechanical reproduction, of all things. Perhaps I should just enjoy the beautiful irony here, but the image of myself spending my brief time on this planet thoughtfully reading and responding to the patterns of an algorithm fills me with a horror that edges toward the existential.
Agents, tools, and security

April 28, 2023

Matt Webb recently shared an approach to controlling smart home infrastructure with language models—a step towards his ultimate goal of creating “a new operating system for physical space”

I spent Friday night and Saturday at the London AI Hackathon… I buddied up with old colleague Campbell Orme and together we built Lares: a simulation of a smart home, with working code for an generative-AI-powered assistant.

[…]

It’s using the ReAct pattern, which is straightforward and surprisingly effective… This pattern gets the AI to respond by making statements in a Thought/Action/PAUSE/Observation loop

[…]

Generally with the ReAct pattern the tools made available to the AI allow it to query Google, or look up an article in Wikipedia, or do a calculation… For Lares we made the smart home into a tool. We said: hey here are the rooms, here are the devices, and here are their commands, do what you want.

After a certain point, especially once you give an AI agent the ability to act on your behalf—turn on and off your lights, send emails as you, lock and unlock the doors to your house…—security vulnerabilities start to become a serious concern.

In a recent blog post, Simon Willison proposed a potential solution to prompt injection attacks. He suggests filtering all user requests through a bespoke “security” LLM before sending it off to a more powerful “agent” LLM:

I think we need a pair of LLM instances that can work together: a Privileged LLM and a Quarantined LLM.

The Privileged LLM is the core of the AI assistant. It accepts input from trusted sources—primarily the user themselves—and acts on that input in various ways.

It has access to tools: if you ask it to send an email, or add things to your calendar, or perform any other potentially destructive state-changing operation it will be able to do so, using an implementation of the ReAct pattern or similar.

The Quarantined LLM is used any time we need to work with untrusted content—content that might conceivably incorporate a prompt injection attack. It does not have access to tools, and is expected to have the potential to go rogue at any moment.

It has become increasingly clear that the process of creating robust systems that incorporate language models is going to look very similar to “traditional" programming. Sure, it might be an extremely “high level” programming language but it still carries many of the existing complexities that have always been present.
Humane

April 27, 2023

Benjamin Mayo:

Humane, the secretive startup founded by ex-Apple software design chief Imran Chaudri, finally went public with Chaudri showing off their device for the first time at the TED conference last week…

Chaudri’s talk is centered on the premise that technology (mainly through the smartphone) has invaded all of our lives too much. The idea is that personalized artificial intelligence can be used to dramatically change how we interact with technology. Rather than proactively opening an app to do something, AI can be an ambient thing that is there when you need it, works in the background of your life, and mostly stays out of your way.

To make this a reality, Humane is introducing a new product: a wearable that resembles a rectangular pin badge. Chaudri is wearing one on his jacket pocket during the presentation. He sets out the vision of their product as something that is “screenless, seamless and sensing”.

There is something that is just fundamentally cool about Humane’s product—it just feels like a device from the future.

The problem is Chaudri’s insistence that their device is a replacement for smartphones. John Gruber recently wrote a great piece about this:

So far, it feels like Humane’s entire premise is founded on that same mistake: building a new device intended to replace our phones, without that new device being able to do any of the dozens of things we love to do on our phones that require a display. Apple Watch and AirPods thrive because they’re satellites to our iPhones, not ostensible replacements… Anything that attempts to establish a post-phone beachhead has to do the things we love to do with our phones, or entertain us in new ways that make us forget about them. I don’t see how a laser projector on a chest badge does that.

Humane is so close to building the product I have been dreaming of. But for them to succeed, they first need to accept that, until their device is ten times better than the smartphone, it won’t supersede smartphones as the center of personal computing.

Honestly, Humane should consider scrapping the whole projector idea and focus their efforts on making an incredible app. But if Chaudri really wants to build a viable hardware project, it must be an accessory to the smartphone.

There is a sense in which any hardware project is doomed from the start, though. If Humane were to ever create a wearable that sees widespread success, Apple will undoubtably sherlock the technology and incorporate it into a “next generation” Apple Watch—I am sure they already have a similar R&D project on the back burner, just in case.

At the end of the day, I am rooting for Humane but that doesn’t mean I am optimistic.
The Snapchat shoehorn

April 26, 2023

Sarah Perez, TechCrunch:

Launched last week to global users after initially being a subscriber-only addition, Snapchat’s new AI chatbot powered by OpenAI’s GPT technology is now pinned to the top of the app’s Chat tab where users can ask it questions and get instant responses. But following the chatbot’s rollout to Snapchat’s wider community, Snapchat’s app has seen a spike in negative reviews amid a growing number of complaints shared on social media.

Over the past week, Snapchat’s average U.S. App Store review was 1.67, with 75% of reviews being one-star, according to data from app intelligence firm Sensor Tower.

I was optimistic about Snapchat’s My AI feature when it initially launched last month:

Snapchat has a new AI chatbot. They are, in hindsight, the perfect company to experiment with personality-driven chat. They have a younger user base, less fear of upsetting a stodgy corporate audience, and a history of being an early adopter to strange new technologies.

That was evidently an incorrect analysis. While it might be true the Snapchat company is well positioned to experiment with emerging technologies, the Snapchat user base certainly doesn’t universally appreciate being subject to these experiments.

On further reflection, I think the general principal that I wrote about a few weeks ago in regards to Google can be applied more broadly:

Generative AI is a fundamentally new technology; therefore, you should allow that to guide you into new products that were impossible or impractical previously. Attempting to shoehorn AI into existing products will be awkward, at best.

At the very least, if you are committed to the ill-advised “shoehorn” strategy, you should make these new features optional, ideally opt-in. No one appreciates it when a well-known user interface suddenly changes—no matter the reason that prompted the change.
The Wild West

April 25, 2023

Nilay Patel, The Verge:

Here’s the basics: there’s a new track called “Heart on My Sleeve” by a TikTok user called @ghostwriter877 with AI-generated vocals that sound like Drake and The Weeknd.

[…]

This prompted Drake and The Weeknd’s label Universal Music Group to issue a sternly-worded statement about the dangers of AI, which specifically says that using generative AI infringes its copyrights.

[…]

The first legal problem with using AI to make a song with vocals that sound like they’re from Drake is that the final product isn’t a copy of anything… Instead, UMG and Getty Images and publishers around the world are claiming that collecting all the training data for the AI is copyright infringement

The bottom line is that there is no clear precedent in place dictating the way generative AI relates to existing copyright law. This leaves a bit of a grey zone that creators, for now, are free to explore.

Martine Paris, Forbes:

In the wake of the AI-generated hit Heart on My Sleeve going viral with deepfakes of multi-platinum artists Drake and The Weeknd, pop star Grimes has invited her fans to create music with her voice.

On Sunday night she tweeted, “I’ll split 50% royalties on any successful AI generated song that uses my voice. Same deal as I would with any artist i collab with. Feel free to use my voice without penalty. I have no label and no legal bindings.”

Chloe Xiang, Vice:

A Discord server called AI Hub hosts a large community of AI music creators behind some of the most viral AI songs. This server was created on March 25 and now has over 21,000 users.

[…]

UTOP-AI, the album created by the Discord community, features original songs using AI-generated vocals from famous artists including Travis Scott, Drake, Baby Keem, and Playboi Carti. Qo, Snoop Dogg, and twenty other people involved in the AI Hub community worked on it.

This album puts into practice what drew Qo and Dogg to AI music in the first place—the ability to create material for artists they wish to hear more of.
Goodbye, APIs

April 24, 2023

Nearly six months after the launch of ChatGPT—after Bard, Bing, Claude, LLaMA, and StabilityLM are subsequently released—one after another, large user generated content companies are closing off access to their data for training AI models.

Paresh Dave, Wired:

Stack Overflow, a popular internet forum for computer programming help, plans to begin charging large AI developers as soon as the middle of this year for access to the 50 million questions and answers on its service, CEO Prashanth Chandrasekar says.

Mike Isaac, The New York Times:

[Reddit] said on Tuesday that it planned to begin charging companies for access to its application programming interface, or A.P.I.

[…]

Mr. Huffman said Reddit’s A.P.I. would still be free to developers who wanted to build applications that helped people use Reddit… But for the A.I. makers, it’s time to pay up.

Kif Leswing, CNBC:

Twitter CEO Elon Musk threatened to sue Microsoft on Wednesday, accusing the software giant of illegally using the social media company’s data to train its artificial intelligence model.

[…]

Musk said in December that Twitter would “pause” OpenAI’s access to its database.

It is actually unlikely that new training data from any of these companies will be necessary any time soon. Language models need a huge amount of text in order to learn basic grammar, writing styles, and general facts. Specific, up-to-date information, on the other hand, is best integrated by plugging in external tools.

After a while, though, it will be necessary to update the foundation model’s training data. When that happens, large companies that are able to either pay for API access or strike data exchange deals will unequally benefit.