Chris Martin

BuzzFeed’s mullet

April 21, 2023

Benjamin Mullin and Katie Robertson, The New York Times:

BuzzFeed is shutting down its news division as part of an effort to cut 15 percent of its work force, the company’s chief executive, Jonah Peretti, said Thursday in a memo to employees.

[…]

BuzzFeed will continue to publish news on HuffPost, which Mr. Peretti said in his memo was profitable and less dependent on social platforms. He added that the company was moving forward “only with parts of the business that have demonstrated their ability to add to the company’s bottom line.”

Peretti evidently does not appreciate the fact that BuzzFeed News' true value is not reflected by the revenue it generates. BuzzFeed News gives the entire “BuzzFeed” brand a degree of legitimacy and esteem it would not otherwise have.

Before News began publishing serious journalism and winning Pulitzers, BuzzFeed was (appropriately) synonymous with low-quality listicles, quizzes, and clickbait.

The entire conceit was that BuzzFeed.com was the “junk food” that funded important investigative journalism—what is BuzzFeed’s purpose without News?

In other words, BuzzFeed is loosing an essential part of its mullet, as Josh Marshall puts it:

The journalism played an even more niche, operational role. Buzzfeed mastered the distribution element of social media very, very fast. But it had listicles and cat photos and other stuff like that. That’s tons of traffic. But it’s not the prestige play that brings you top shelf premium ad dollars. The journalism was really a loss-leader in that calculus. GM or Bacardi isn’t going to sign on to the be the exclusive sponsor of your Grumpy Cat slideshow, even if millions see it. But put a Pulitzer in the mix and it’s a very different story. There was always a big mullet aspect to these plays: prestige up front (news reporting), party in the back (listicles and memes).
StableLM

April 20, 2023

Speaking of open source language models…

Stability.ai:

Today, Stability AI released a new open-source language model, StableLM. The Alpha version of the model is available in 3 billion and 7 billion parameters, with 15 billion to 65 billion parameter models to follow. Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license.

[…]

StableLM is trained on a new experimental dataset built on The Pile, but three times larger with 1.5 trillion tokens of content. We will release details on the dataset in due course. The richness of this dataset gives StableLM surprisingly high performance in conversational and coding tasks, despite its small size

Unlike LLaMA, the base model is completely free to use commercially. The instruction tuned model, however, is only licensed for noncommercial research.

We are also releasing a set of research models that are instruction fine-tuned. Initially, these fine-tuned models will use a combination of five recent open-source datasets for conversational agents: Alpaca, GPT4All, Dolly, ShareGPT, and HH. These fine-tuned models are intended for research use only and are released under a noncommercial CC BY-NC-SA 4.0 license, in-line with Stanford’s Alpaca license.

This limitation will likely only be temporary, though, as Stability appears to be working on putting together a new instruction tuning / RLHF dataset that will presumably be permissibly licensed.

We will be kicking off our crowd-sourced RLHF program, and working with community efforts such as Open Assistant to create an open-source dataset for AI assistants.

Remember, instruction tuning is what allows your prompts to be natural and conversational. For example, you might prompt the base model with “here is a list of ten dog breeds: 1)” while you could prompt the instruction tuned model “write a list of ten dog breeds.”

Overall, this release is a huge deal if only because it creates the obvious Schelling point for future open source development work. When it was first released, Stable Diffusion was resource intensive and low quality. After a flurry of open source contributions, it quickly became the highest quality option while, at the same time, becoming efficient enough to run locally on an iPhone. If the same story occurs with StableLM, this will become a more important release than GPT-4.
A harrowing broadcast

April 19, 2023

From Bret Devereaux’s excellent series on the history and mechanics of farming:

In places where seed-drilling devices weren’t available, seeds were sown by the broadcast method. The ground was plowed, then the seeds were thrown out over the ground (literally cast broadly; this is where our term broadcast comes from); the ridges created by plowing would cause most of the seeds to fall into the grooves (called furrows; thus a ‘furrowed’ brow being one scrunched up to create ridges and depressions that looked like a plowed field), creating very rough rows of crops once those seeds sprouted. Then the land is then harrowed (where our sense of ‘harrowing‘ comes from – seriously, so much English idiomatic expressions are farming idioms, for obvious reasons), typically with rakes and hoes to bury the seeds by flattening out the ridges (but not generally entirely erasing them) in order to cover the seeds over once they had been placed with very loose clods of earth.
Dolly and RedPajama

April 18, 2023

Databricks:

Today, we’re releasing Dolly 2.0, the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.

Dolly 2.0 is a 12B parameter language model based on the EleutherAI pythia model family and fine-tuned exclusively on a new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees.

[…]

databricks-dolly-15k contains 15,000 high-quality human-generated prompt / response pairs specifically designed for instruction tuning large language models. Under the licensing terms for databricks-dolly-15k, anyone can use, modify, or extend this dataset for any purpose, including commercial applications.

To the best of our knowledge, this dataset is the first open source, human-generated instruction dataset specifically designed to make large language models exhibit the magical interactivity of ChatGPT.

The release of the “databricks-dolly-15k” instruction tuning dataset under a permissive license is a much bigger deal than the trained model itself.

Language models will no doubt continue to face questions regarding training data provenance. Any and all datasets that are open, high quality, and free of copyright and ethics concerns will only improve the perceived legitimacy of future models.

RedPajama, the open source 1.2 trillion token pre-training dataset, is a big deal for the same reason.

The RedPajama base dataset is a 1.2 trillion token fully-open dataset created by following the recipe described in the LLaMA paper.

[…]

We aim to create a fully open-source reproduction of LLaMA, which would be available for commercial applications, and provide a more transparent pipeline for research.

Without a doubt, someone will soon train an open source language model on RedPajama’s base data and then apply RLHF fine-tuning using databricks-dolly-15k. This would be the first instruction-tuned language model that is fully unencumbered by copyright concerns.
Magi and the Google Inbox strategy

April 17, 2023

Nico Grant, The New York Times:

A.I. competitors like the new Bing are quickly becoming the most serious threat to Google’s search business in 25 years, and in response, Google is racing to build an all-new search engine powered by the technology.

[…]

The new features, under the project name Magi, …would offer users a far more personalized experience than the company’s current service, attempting to anticipate users’ needs.

[…]

The system would learn what users want to know based on what they’re searching when they begin using it. And it would offer lists of preselected options for objects to buy, information to research and other information… Magi would keep ads in the mix of search results. Search queries that could lead to a financial transaction

[…]

Last week, Google invited some employees to test Magi’s features… Google is expected to release the tools to the public next month and add more features in the fall, according to the planning document.

I have been critical of Google’s AI strategy. Generative AI is a fundamentally new technology; therefore, you should allow that to guide you into new products that were impossible or impractical previously. Attempting to shoehorn AI into existing products will be awkward, at best.

While we don’t know many details of what Magi will ultimately look like, I am pleasantly surprised Google appears to be taking a blank-slate approach to its design and development.

I would love to see Google bring back the strategy they used with Inbox—create a playground to experiment with new ideas, unencumbered by tradition. When the time was right, Google took what they learned from Inbox and integrated it into Gmail. Maybe Magi will ultimately be merged into Google Search. Even so, Magi still would have played a valuable role as a test lab. If I am right, though, and generative AI will be most successful as a new product, Google would be well positioned for that, too.
In which Stanford researchers recreate Westworld

April 14, 2023

Westworld on Wikipedia:

In the 2050s, Delos Inc. operates several theme parks, including the American Old West-themed Westworld. Each environment is populated by the “Hosts”, biomechanical robots indistinguishable from humans. The Hosts are programmed to fulfill the guests' every desire… The park’s operators create narratives for these Hosts to carry out while interacting with guests

Joon Sung Park et al. at Stanford:

In this paper, we introduce generative agents—computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversation.

[…]

We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine’s Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time.

2050 seems like a pretty good prediction after all.
Auto-GPT

April 13, 2023

From the GitHub repository:

Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, chains together LLM “thoughts”, to autonomously achieve whatever goal you set.

The idea is that you prompt Auto-GPT with a goal—buy me the best E-bike—and then a high-level “agent” breaks down this goal into a hierarchy of tasks—research reviews, compare prices, find distributors, etc—the primary agent then delegates “sub-agents” to complete each task.

Think of it as giving GPT-4 the ability to recursively call itself.

Additionally, each agent has access to a variety of tools. For example, they can use the internet, execute code, and store information in short & long term memory.

Auto-GPT’s developer Toran Richards, in an interview with Vice:

The ability to function with minimal human input is a crucial aspect of Auto-GPT. It transforms a large language model from what is essentially an advanced auto-complete, into an independent agent capable of carrying out actions and learning from its mistakes
Anthropic’s ambitions

April 12, 2023

Kyle Wiggers, Devin Coldewey, and Manish Singh at TechCrunch:

AI research startup Anthropic aims to raise as much as $5 billion over the next two years to take on rival OpenAI and enter over a dozen major industries, according to company documents obtained by TechCrunch.

A pitch deck for Anthropic’s Series C fundraising round discloses these and other long-term goals for the company

[…]

“These models could begin to automate large portions of the economy,” the pitch deck reads. “We believe that companies that train the best 2025/26 models will be too far ahead for anyone to catch up in subsequent cycles.”

[…]

Dario Amodei, the former VP of research at OpenAI, launched Anthropic in 2021 as a public benefit corporation… Amodei split from OpenAI after a disagreement over the company’s direction, namely the startup’s increasingly commercial focus.

[…]

“Anthropic has been heavily focused on research for the first year and a half of its existence, but we have been convinced of the necessity of commercialization, which we fully committed to in September [2022],” the pitch deck reads.

There is something vaguely sad about Anthropic following OpenAI in adopting a commercial-first perspective. As stated in the quote above, Anthropic was initially founded as a counter response to OpenAI’s commercialization.

Anthropic does not even seem particularly adept at generating product hype—until now, I was under the impression they were intentionally trying to remain low-profile.

Despite all of this, I think it is a smart business move to make—OpenAI can’t be the only company selling access to state-of-the-art generative AI APIs—I guess I just wish it was another company that filled the void and that Anthropic was more devoted to maintaining its founding directive.
More on Bard

April 11, 2023

Jennifer Elias, CNBC:

Google is reshuffling the reporting structure of its virtual assistant unit — called Assistant — to focus more on Bard, the company’s new artificial intelligence chat technology.

[…]

The new leadership changes suggest that the Assistant organization may be planning on integrating Bard technology into similar products in the future.

The most critical advantage Google, Amazon, and Apple have over OpenAI is that they all have existing smart assistants integrated into customer’s devices. I would love to see Google take the lead in upgrading their assistant with generative AI capabilities.

Miles Kruppa, Wall Street Journal:

Google plans to add conversational artificial-intelligence features to its flagship search engine, Chief Executive Officer Sundar Pichai said

[…]

“Will people be able to ask questions to Google and engage with LLMs in the context of search? Absolutely,” Mr. Pichai said.

[…]

Google is testing several new search products, such as versions that allow users to ask follow-up questions to their original queries, Mr. Pichai said. The company said last month that it would begin “thoughtfully integrating LLMs into search in a deeper way,” but until now hadn’t detailed plans to offer conversational features.

I don’t know… I haven’t used Bing as an “AI search engine” in at least a month. Language models—while adjacent to traditional search engines—are an entirely new technology. As time goes on, I am less convinced integrating them into existing products is the best approach.

Maybe, when it comes to search, Google should strive to make the best search engine it can. Down-rank SEO spam, improve operators, and innovate with new features. Don’t reimagine search, refine search.

To be clear, I think they should continue to develop and improve Bard—but let it be its own thing, don’t just thoughtlessly tack it onto all of your old stuff.
A calculator for words

April 10, 2023

Simon Willison:

I like to think of language models like ChatGPT as a calculator for words.

This is reflected in their name: a “language model” implies that they are tools for working with language. That’s what they’ve been trained to do, and it’s language manipulation where they truly excel.

Want them to work with specific facts? Paste those into the language model as part of your original prompt!

[…]

A calculator for words is an incredibly powerful thing.

“A calculator for words” is a great analogy for language models. It is the framing the ultimately clicked for me when ChatGPT first made it clear that generative AI was going to quickly change some of our longstanding education paradigms.

From a post I wrote in December 2022:

The most exciting path forward is one where we frame Large Language Models as “a calculator for text”. Just as the invention of pocket calculators was a giant disruption that forced us to re-evaluate our approach to mathematics education, language models will continue to force us to re-evaluate our approach to research and writing. Done correctly this will open the door for us to learn more quickly, use our time more effectively, and progress further than we possibly could before.
Filigree

April 7, 2023

Matt Webb:

AI makes the composition of quote-creative-unquote works cheap.

But AI is the instrument. There is still the question of the composer. Somebody needs to decide and prompt exactly what music my electric vehicle should perform.

Though I do feel like generative AI will mean that decoration, ornament and filigree becomes cheap again? And maybe we’ll move into an aesthetic in which our furniture, white goods, and accessories superficially resemble the busy-busy arts and crafts era - but actually it’s because, well, it costs almost nothing to do (it’s just software) and it makes the object look NEW.

The proliferation of smartphones with built-in digital cameras allowed more people to take more photos than ever before. Synthesizers and DAWs had a similar effect on music.

If generative AI similarly lowers the barrier to entry for all forms of text, imagery, audio, and video, perhaps it will lead to a further democratization of creative expression.
Silicon librarian

April 6, 2023

Jack Clark:

Financial data behemoth Bloomberg has built ‘BloombergGPT’, a language model based in part on proprietary data from Bloomberg.

[…]

I think of BloombergGPT as more like a silicon librarian/historian than a model; by training it on a huge amount of private and internal Bloomberg data, the LLM is in effect a compressed form of ‘institutional memory’ and a navigator of Bloomberg’s many internal systems… Systems like BloombergGPT will help companies create software entities that can help to navigate, classify, and analyze the company’s own data stack.

This is one of the most compelling uses for language models to date.

It is what Microsoft is bringing to all of their 365 enterprise customers with their upcoming Business Chat agent and it is what I would like to see Apple implement across their ecosystem with “Siri 2.0”.

It is also a little scary. If all of your personal or institutional knowledge is stored in an unintelligible tangle of model weights, what happens if it gets poisoned, corrupted, or stolen?
Time Sense

April 4, 2023

Thoughtworks:

Time Sense is a wearable sensory headband which allows the wearer to feel the passing of the 24-hour clock around the circumference of the head. As the day progresses, a tiny heat sensation passes the length of the headband.

This device is an example of an ‘exosense’, an external sensory organ. This means it is designed to be worn and felt consistently, twenty-four hours a day, seven days a week.

For a little while, I had a setting enabled on my Apple Watch that caused a quick haptic alert to occur at the top of each hour. I thought this would help snap me out of situations where I get sidetracked and loose track of time. Well, it did help with that, but I quickly realized that I do not like having such a constant, physical reminder of the passage of time. It was like some terrible combination of a super power and a memento mori.
The letter

April 3, 2023

Ross Douthat:

A collection of Silicon Valley notables, including Elon Musk, just signed an open letter urging at least a six-month pause in large-scale A.I. experiments to allow our safety protocols to catch up

[…]

Generally, when human beings turn against a technology or move to restrain it, we have a good idea of what we’re afraid of happening, what kind of apocalypse we’re trying to forestall. The nuclear test ban treaties came after Hiroshima and Nagasaki, not before.

Or a less existential example: The current debate about limiting kids’ exposure to social media is potent because we’ve lived with the internet and the iPhone for some time; we know a lot about what the downsides of online culture seem to be. Whereas it’s hard to imagine persuading someone to pre-emptively regulate TikTok in the year 1993.

There are certainly groups of people—that I fully respect—who have long pushed for drastic measures to be taken towards AI alignment.

There are others—programmers, marketers, and other white collar workers—who have felt a sudden plunge in job their security. That is legitimately scary and I can not criticize them for feeling nervous.

There is a third group—employees and executives at large tech companies—that are uncomfortable about the current trajectory of AI for an entirely different reason: they feel left behind.

The letter feels like that third group taking advantage of the anxieties of the first two. Any development “pause” that would result from this would only give competing companies time to catch up to OpenAI.
Buzzfeed and AI

March 31, 2023

A little over two months ago I wrote this in response to Buzzfeed piloting AI personalized quizzes:

There is no need to reject the use of new technologies; by all means, experiment! But I am worried using AI to create content out of whole cloth risks devaluing all of the work you produce. Instead, using AI for personalization and curation will be much healthier step forward. I think BuzzFeed is on the right track here. CNET, less so.

Well, it looks like the Buzzfeed recently began pivoting to giving AI a more editorial role.

Noor Al-Sibai and Jon Christian, Futurism:

This month, we noticed that with none of the fanfare of [Buzzfeed CEO] Peretti’s multiple interviews about the quizzes, BuzzFeed quietly started publishing fully AI-generated articles that are produced by non-editorial staff — and they sound a lot like the content mill model that Peretti had promised to avoid.

The 40 or so articles, all of which appear to be SEO-driven travel guides, are comically bland and similar to one another.

[…]

a note on the top [of these articles] says they were “collaboratively written” with a human employee.

Are those human employees BuzzFeed journalists? No. Instead, they’re non-editorial employees who work in domains like client partnerships, account management, and product management.

A BuzzFeed spokesperson told us that the AI-generated pieces are part of an “experiment” the company is doing to see how well its AI writing assistance incorporates statements from non-writers.

Now, to be fair, these are articles for Buzzfeed, not Buzzfeed News, which is an independent news organization. What it is, though, is a testament to how strong the pull towards AI will be once companies realize its potential—for better or worse.
Wavelength

March 30, 2023

Wavelength is a new app built specifically for group chats. This isn’t something that would typically be on my radar except that, in this case, John Gruber is an advisor of theirs.

Gruber is opinionated, picky, hypercritical, and, crucially, has a great design sense — particularly when it comes to Apple platforms. That was enough to convince me to give it a try.

John Gruber:

Messages, Signal, WhatsApp, and their cohorts all share the same fundamental two-level design: a list of chats, and a single thread of a messages within each chat. This is the obvious and correct design for a messaging app whose primary focus is one-on-one personal chats. Group chats, in these apps, work best the closer they are in membership to one-on-one.

Wavelength is different because it’s group-first. This manifests conceptually by adding a third, middle level to the design: threads. At the root level of Wavelength are groups. Groups have an owner, and members. At the second level are threads. Inside threads, of course, are the actual messages.

[…]

While Wavelength itself is not a social network, it’s a platform that lets you create your own private micro social networks in the form of groups…

You only join groups that interest you. You only pay attention to threads within the group that interest you. The result feels natural and profoundly efficient in terms of your attention and time.

My initial impression—after using Wavelength for the past couple of days—is that it has tremendous potential, the UI and UX are great, but it is still missing a few affordances I have come to expect from similar apps

My biggest gripe is that there is no built-in discovery mechanism for public groups. To help rectify that, here are invite links to a few groups I’ve joined: Gardening, Apple, Hacker News, and AI.
Beauty too cheap to meter

March 29, 2023

Scott Alexander:

MANN: Air, you say you like generating AI art. What do you think of people who accuse AI of stealing from human artists?

AIR: Good artists borrow, great artists steal. I am a great artist.

MANN: Touche. But doesn’t it bother you that AIs can work thousands of times faster than humans, putting human artists out of jobs? We wanted AIs to free us from drudgery so we could focus on the finer things in life; instead, they’re taking art and poetry, leaving us with menial labor.

AIR: Let me rephrase that. You wanted quicker burger-flipping; instead, you got beauty too cheap to meter. The poorest welfare recipient can now commission works of wonder to make a Medici seethe with envy…
Design vs. operations, augmented intelligence

March 28, 2023

Patrick McGee & Tim Bradshaw, reporting for Financial Times:

After seven years in development — twice as long as the iPhone — [Apple] is widely expected to unveil a headset featuring both virtual and augmented reality as soon as June.

[…]

The timing of the launch has been a source of tension since the project began in early 2016… Apple’s operations team wanted to ship a “version one” product, a ski goggle-like headset… but Apple’s famed industrial design team had cautioned patience, wanting to delay until a more lightweight version of AR glasses became technically feasible.

[…]

Just a few years ago, going against the wishes of Apple’s all-powerful design team would have been unthinkable… A former Apple engineer said operations taking more control over product development is a “logical progression” of Apple’s trajectory under Cook. The best part of working at Apple, this person said, used to be coming up with engineering solutions to the “insane requirements” from the design team, but that has changed in recent years.

Mark Gurman at Bloomberg:

There was a momentous gathering at Apple Inc. last week, with the company’s roughly 100 highest-ranking executives descending on the Steve Jobs Theater in Cupertino, California. The group, known as the Top 100, was there to see Apple’s most important new product in years: its mixed-reality headset.

[…]

The demonstrations were polished, glitzy and exciting, but many executives are clear-eyed about Apple’s challenges pushing into this new market… the device will start at around $3,000, lack a clear killer app, require an external battery that will need to be replaced every couple of hours and use a design that some testers have deemed uncomfortable. It’s also likely to launch with limited media content.

[…]

When subsequent headset models arrive, Apple executives expect consumer interest to grow. The company is preparing a version that will cost half as much, as well as a successor to the first model with far better performance. Those should hit within two years of the initial headset.

I remain very excited to see Apple’s headset, even if the price point and form factor mean that I will personally hold off on purchasing one until future iterations become available.

Lightweight, wireless, augmented reality (AR) glasses with passive artificial intelligence (AI) capabilities seems like the first truly compelling successor to the smartphone. In the past four months, we have suddenly made enough progress on the AI side to make this feasible — now it is AR’s turn to catch up.

AR + AI = Augmented Intelligence?
ChatGPT plugins

March 24, 2023

Just a few days ago, I was thinking about how great it would be if OpenAI were to integrate something similar to LangChain into ChatGPT. The idea behind LangChain and similar projects is straightforward: if ChatGPT had tools — like web search to verify factual information and a calculator or code interpreter to answer complicated arithmetic questions — many of the downsides to language models, particularly their tendency to hallucinate, would be alleviated.

Well…

OpenAI:

We’ve implemented initial support for plugins in ChatGPT. Plugins are tools designed specifically for language models with safety as a core principle, and help ChatGPT access up-to-date information, run computations, or use third-party services.

The new feature is launching with initial support from a few large companies including Wolfram Alpha, Instacart, and Zapier. Additionally, there is documentation available for third-party developers to build their own plugins.

However, what I am most excited about right now are two of the first-party plugins OpenAI developed.

First, web browsing:

Motivated by past work (our own WebGPT, as well as GopherCite, BlenderBot2, LaMDA2 and others), allowing language models to read information from the internet strictly expands the amount of content they can discuss, going beyond the training corpus to fresh information from the present day.

This seems to have all of the capabilities of Microsoft’s Bing AI plus the ability to navigate though individual websites autonomously.

Here is OpenAI’s other plugin, a code interpreter:

We provide our models with a working Python interpreter in a sandboxed, firewalled execution environment… We would like our models to be able to use their programming skills to provide a much more natural interface to most fundamental capabilities of our computers. Having access to a very eager junior programmer working at the speed of your fingertips can make completely new workflows effortless and efficient, as well as open the benefits of programming to new audiences.

One of the best ways I have found to easily verify ChatGPT’s mathematics answers is to ask it to create a Python program that will calculate the solution for me. This has the downside of requiring additional steps on my part — copy and paste the code, execute the Python program on my computer, compare the results. I am particularly excited to try the new interpreter plugin for exactly this reason.

Finally, the obvious next step that I would love to see is a meta-layer that is aware of all of the available plugins and, for each individual query, automatically chooses the plugin best suited for the task. At the speed all of these AI developments are moving we should have that ability in, what, a month?
Ernie

March 23, 2023

Zeyi Yang, MIT Technology Review:

On March 16, Robin Li, Baidu’s cofounder and CEO, took the stage in Beijing to showcase the company’s new large language model, Ernie Bot.

Accompanied by art created by Baidu’s image-making AI, he showed examples of what the chatbot can do, including solve math questions, write marketing copy, answer questions about Chinese literature, and generate multimedia responses.

[…]

The highlight of the product release was Ernie Bot’s multimodal output feature, which ChatGPT and GPT-4 do not offer… Li showed a recorded interaction with the bot where it generated an illustration of a futuristic city transportation system, used Chinese dialect to read out a text answer, and edited and subtitled a video based on the same text. However, in later testing after the launch, a Chinese publication failed to reproduce the video generation.

If Baidu’s presentation is accurate, Ernie’s multimodal features are genuinely impressive. While the image generation abilities do not seem any more advanced than DALL-E, the audio and video generation features are honestly striking.

Meanwhile… Che Pan at SCMP:

Fang Bingxing, considered the father of China’s Great Firewall… said the rise of generative AI tools like ChatGPT… pose a big challenge to governments around the world, according to an interview published on Thursday… “People’s perspectives can be manipulated as they seek all kinds of answers from AI,” he was quoted as saying.

[…]

Many expected that China’s heavily-censored internet would be a challenge for Chinese tech companies in developing a ChatGPT-like service because it is hard to predict and control answers.

China’s powerful internet regulators have told Chinese tech companies not to offer ChatGPT access to the public, and they need to inform the authorities before launching their own ChatGPT-like services, according to a report by Nikkei Asia in February
Bill Gates on AI

March 22, 2023

I don’t typically think of Bill Gates as someone prone to making hyperbolic claims. His recent assertion that “artificial intelligence is as revolutionary as mobile phones and the Internet” is all the more arresting for that very reason.

Bill Gates:

In my lifetime, I’ve seen two demonstrations of technology that struck me as revolutionary.

The first time was in 1980, when I was introduced to a graphical user interface.

The second big surprise came just last year. I’d been meeting with the team from OpenAI since 2016 and was impressed by their steady progress. In mid-2022, I was so excited about their work that I gave them a challenge: train an artificial intelligence to pass an Advanced Placement biology exam. Make it capable of answering questions that it hasn’t been specifically trained for… If you can do that, I said, then you’ll have made a true breakthrough.

In September, when I met with them again, I watched in awe as they asked GPT, their AI model, 60 multiple-choice questions from the AP Bio exam—and it got 59 of them right. Then it wrote outstanding answers to six open-ended questions from the exam. We had an outside expert score the test, and GPT got a 5—the highest possible score…

I knew I had just seen the most important advance in technology since the graphical user interface.
Copilot versus autopilot

March 21, 2023

There has been something bouncing around in my head in the days since both Google and Microsoft announced new AI features for their productivity applications. I felt significantly more negatively about Google’s framing of the features than Microsoft’s. I did not understand why — they are effectively the same announcements, right? Both companies are adding generative AI to their writing, slideshow, and spreadsheet apps — why should I feel differently about either of them? Then, I read both of their press releases again…

This is how Google describes an intended use case for their new AI features:

In Gmail and Google Docs, you can simply type in a topic you’d like to write about, and a draft will be instantly generated for you. So if you’re a manager onboarding a new employee, Workspace saves you the time and effort involved in writing that first welcome email.

In contrast, here is Microsoft. The AI is closer to a creative partner than anything else:

Copilot gives you a first draft to edit and iterate on — saving hours in writing, sourcing, and editing time. Sometimes Copilot will be right, other times usefully wrong — but it will always put you further ahead. You’re always in control as the author, driving your unique ideas forward, prompting Copilot to shorten, rewrite or give feedback.

On Stratechery, Ben Thompson finds a similar distinction:

In Google’s view, computers help you get things done — and save you time — by doing things for you.

[…]

All of [Microsoft’s] demos throughout the presentation reinforced this point: the copilots were there to help, not to do — even if they were in fact doing a whole bunch of the work. Still, I think the framing was effective: it made it very clear why these copilots would be beneficial, demonstrated that Microsoft’s implementation would be additive not distracting, and, critically, gave Microsoft an opening to emphasize the necessity of reviewing and editing. In fact, one of the most clever demos was Microsoft showing the AI making a mistake and the person doing the demo catching and fixing the mistake while reviewing the work.

To Microsoft, AI should help. To Google, AI should do.

A genuine case could be made for both approaches. I know which one I prefer, though.
Microsoft Copilot

March 17, 2023

To cap off a week of AI announcements from OpenAI, Anthropic, and Google, Microsoft announced Copilot for their 365 productivity suite yesterday.

Jared Spataro, Microsoft:

Today, we are bringing the power of next-generation AI to work. Introducing Microsoft 365 Copilot — your copilot for work. It combines the power of large language models (LLMs) with your data in the Microsoft Graph and the Microsoft 365 apps to turn your words into the most powerful productivity tool on the planet.

[…]

Copilot is integrated into Microsoft 365 in two ways. It works alongside you, embedded in the Microsoft 365 apps you use every day — Word, Excel, PowerPoint, Outlook, Teams and more — to unleash creativity, unlock productivity and uplevel skills. Today we’re also announcing an entirely new experience: Business Chat. Business Chat works across the LLM, the Microsoft 365 apps, and your data — your calendar, emails, chats, documents, meetings and contacts — to do things you’ve never been able to do before. You can give it natural language prompts like “Tell my team how we updated the product strategy,” and it will generate a status update based on the morning’s meetings, emails and chat threads.

[…]

AI-powered LLMs are trained on a large but limited corpus of data. The key to unlocking productivity in business lies in connecting LLMs to your business data — in a secure, compliant, privacy-preserving way. Microsoft 365 Copilot has real-time access to both your content and context in the Microsoft Graph. This means it generates answers anchored in your business content — your documents, emails, calendar, chats, meetings, contacts and other business data — and combines them with your working context — the meeting you’re in now, the email exchanges you’ve had on a topic, the chat conversations you had last week — to deliver accurate, relevant, contextual responses.

This entire announcement presents an incredibly corporate version of the AI integration I hope to see from Apple someday.

My dream is to ask Siri, “What was I doing last Saturday?” and receive an accurate summary based on all the data from my devices – including calendar events, geolocation, photos, web browsing history, and more. Siri should function as a continuously fine-tuned personal assistant with the ability to answer queries and generate content in a freeform manner. However, this all poses significant privacy concerns. For that reason, it would be crucial that all aspects – training, inference, and storage – occur exclusively on-device. This would really make all of Apple’s Neural Engine development look prescient.
Claude

March 16, 2023

It does not seem like an ideal strategy for Anthropic to publish their big Claude announcement on the same day GPT-4 was released. That is exactly what happened, though, so Claude got a bit buried under the excitement.

Anthropic:

After working for the past few months with key partners like Notion, Quora, and DuckDuckGo in a closed alpha, we’ve been able to carefully test out our systems in the wild, and are ready to offer Claude more broadly so it can power crucial, cutting-edge use cases at scale.

Claude is a next-generation AI assistant based on Anthropic’s research into training helpful, honest, and harmless AI systems. Accessible through chat interface and API in our developer console, Claude is capable of a wide variety of conversational and text processing tasks while maintaining a high degree of reliability and predictability.

[…]

We’re offering two versions of Claude today: Claude and Claude Instant. Claude is a state-of-the-art high-performance model, while Claude Instant is a lighter, less expensive, and much faster option.

From what I have been able to see through the Poe app, Claude is Good. And, I am thankful there is a serious competitor to OpenAI. At the end of the day, though, I am not sure Anthropic is the alternative to OpenAI that the world needs. We need a serious open source project, not another proprietary API.

Update:

I ran some comparisons between Claude, Bard, and GPT-4. You can read the results here.
GPT-4

March 15, 2023

As was widely rumored, OpenAI officially announced GPT-4 yesterday.

OpenAI:

We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%.

[…]

We are releasing GPT-4’s text input capability via ChatGPT and the API (with a waitlist).

Language improvements:

In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5.

[…]

While still a real issue, GPT-4 significantly reduces hallucinations relative to previous models (which have themselves been improving with each iteration). GPT-4 scores 40% higher than our latest GPT-3.5 on our internal adversarial factuality evaluations

Visual inputs:

GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images. Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs.

I have to admit, I am disappointed GPT-4 can not output images and can only accept them as input. I wouldn’t be surprised if this changes before too long, though. Regardless, this is a huge new feature. I am going to have a lot of fun thinking of projects I can try this with as I wait for API access.

Miscellaneous:

GPT-4 generally lacks knowledge of events that have occurred after the vast majority of its data cuts off (September 2021), and does not learn from its experience.

This is disappointing and leads credence to those that say OpenAI is having difficulty filtering AI generated text out of potential training material.

gpt-4 has a context length of 8,192 tokens. We are also providing limited access to our 32,768–context (about 50 pages of text) version, gpt-4-32k

As general prose output improves, the next avenue for major language model development will be increasing context length. The 32,000 token model is especially exciting for that reason.

Rather than the classic ChatGPT personality with a fixed verbosity, tone, and style, developers (and soon ChatGPT users) can now prescribe their AI’s style and task by describing those directions in the “system” message. System messages allow API users to significantly customize their users’ experience within bounds.

I have been having a lot of fun experimenting with altering the “system” message through the GPT-3.5 API. It is great that they will be bringing that capability to the ChatGPT web interface.

ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a usage cap. We will adjust the exact usage cap depending on demand and system performance in practice, but we expect to be severely capacity constrained… To get access to the GPT-4 API (which uses the same ChatCompletions API as gpt-3.5-turbo), please sign up for our waitlist. We will start inviting some developers today, and scale up gradually to balance capacity with demand… Pricing is $0.03 per 1k prompt tokens and $0.06 per 1k completion tokens.

My prediction was correct. GPT-4 is available today to ChatGPT Plus subscribers, everyone else must sign up on the waitlist. Additionally, the API will cost much more than the gpt-3.5 API.

Okay, one more thing: Microsoft confirmed that Bing AI has been using GPT-4 under the hood since launch.

Google should be embarrassed.

In a desperate attempt to insert themselves into the conversation, a few hours before OpenAI’s announcement, Google announced that “select developers” will be invited to try their PaLM language model API with a waitlist coming “soon.”

It would be fair to say that I am more than a bit suspicious of Google’s recent AI efforts. They have been silent about Bard since its announcement more than a month ago where they said the chatbot would be widely available to the public “in the coming weeks.”

Google’s messaging around AI does not sound like it is coming from a company that is excited about building public-facing generative AI technology. More than anything else, their announcement today felt defensive — as if Google is concerned the public will forget they have historically been a leader in AI research. If they stay on their current path, that is exactly what will happen.