-
While I wish development efforts would coalesce around a single, capable, open source language model, it is undoubtedly interesting to see the variety of new entrants in this space.
Introducing MPT-7B, the latest entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B… we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+
The base model, the instruction-tuned model, and the 65k model are all commercially licensed. The chat model has a Creative Commons non-commercial license because it was finetuned on data from OpenAI.
I tried out the chat model on Hugging Face. At first glance, it seems comparable to other open source models I have previously used.
The most surprising announcement here is certainly StoryWriter-65k. The (still unreleased) large context-length version of GPT-4 will be able to handle 32,000 tokens—less than half of what is possible here.
-
§ Rabbit rabbit. Happy May.
§ I drove a pickup truck for the first time.
I needed to rent one from the local big box store to bring home 4x8' polycarbonate panels for the greenhouse’s roof. The surprisingly good visibility made the drive a much less anxiety inducing expirence compared to similarly sized cargo vans I have driven in the past. The downside was that I kept forgetting how long the vehicle really is—this is where a backup camera would have helped tremendously.
I managed to install the new roof right before the start of an endlessly rainy week. This was good for testing the roof, I suppose, but bad for getting any other garden work accomplished.
§ Caroline and I are planning a weekend trip to Confluence, Pennsylvania for my birthday later this month. The plan is to visit Frank Lloyd Wright’s Fallingwater house and the Ohiopyle State Park. Exciting!
§ Caroline brought home a bunch of porcelain tiles she made. We are trying to figure out a fun way to incorporate them into the greenhouse.
§ Links
- Synthetic Summer
- An adoorable game
- News Minimalist
§ Recipes
I finally got the opportunity to use a bunch of the—extremely prolific—chives from the garden in a mushroom risotto this week. Would it have been better to make this recipe when I was growing my own mushrooms? Absolutely. Why didn’t I? Who knows!
-
Yesterday, SemiAnalysis shared a leaked document that purports to be an internal memo written by an engineer at Google:
We’ve done a lot of looking over our shoulders at OpenAI. Who will cross the next milestone? What will the next move be?
But the uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch.
I’m talking, of course, about open source. Plainly put, they are lapping us… While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.
The anonymous engineer attributes many of these open source developments to the leak of Meta’s LLaMA model:
At the beginning of March the open source community got their hands on their first really capable foundation model, as Meta’s LLaMA was leaked to the public. It had no instruction or conversation tuning, and no RLHF. Nonetheless, the community immediately understood the significance of what they had been given.
A tremendous outpouring of innovation followed, with just days between major developments… Here we are, barely a month later, and there are variants with instruction tuning, quantization, quality improvements, human evals, multimodality, RLHF, etc. etc. many of which build on each other.
Indeed, while open source language models still lag behind state of the art closed models, the speed of development is unparalleled. The quality gap has already closed with text-to-image models and there is no reason to think the same won’t happen with LLMs.
To be clear, I think centralized models will remain important, if only for thin clients where compute power is limited. Who knows, though, it is always possible that open source development ends up driving a commoditization of language models where there is no reason to call OpenAI’s API over a random AWS endpoint.
The author concludes that, for this reason, Google should contribute to the open source community instead of attempting to compete with it:
This recent progress has direct, immediate implications for our business strategy. Who would pay for a Google product with usage restrictions if there is a free, high quality alternative without them?
[…]
The more tightly we control our models, the more attractive we make open alternatives. Google and OpenAI have both gravitated defensively toward release patterns that allow them to retain tight control over how their models are used. But this control is a fiction. Anyone seeking to use LLMs for unsanctioned purposes can simply take their pick of the freely available models.
Google should establish itself a leader in the open source community, taking the lead by cooperating with, rather than ignoring, the broader conversation. This probably means taking some uncomfortable steps, like publishing the model weights for small ULM variants. This necessarily means relinquishing some control over our models. But this compromise is inevitable. We cannot hope to both drive innovation and control it.
A pivot towards open source is a great strategy that would clearly differentiate Google from OpenAI at a time when Google sorely needs it. But remember, this memo was written by a Google engineer, not someone from Google’s leadership—the higher-ups at Google appear to be moving in the opposite direction.
Nitasha Tiku, Washington Post:
In February, Jeff Dean, Google’s longtime head of artificial intelligence, announced a stunning policy shift to his staff: They had to hold off sharing their work with the outside world.
For years Dean had run his department like a university, encouraging researchers to publish academic papers prolifically; they pushed out nearly 500 studies since 2019
[…]
Things had to change. Google would take advantage of its own AI discoveries, sharing papers only after the lab work had been turned into products, Dean said, according to two people with knowledge of the meeting
If leadership wants to be proactive, the clock is ticking. During Meta’s most recent earnings call, Mark Zuckerberg made it clear that they intend to embrace an open source approach to AI moving forward.
Mark:
Right now most of the companies that are training large language models have business models that lead them to a closed approach to development. I think there’s an important opportunity to help create an open ecosystem. If we can help be a part of this, then much of the industry will standardize on using these open tools and help improve them further.
[…]
I mentioned LLaMA before and I also want to be clear that while I’m talking about helping contribute to an open ecosystem, LLaMA is a model that we only really made available to researchers and there’s a lot of really good stuff that’s happening there. But a lot of the work that we’re doing, I think, we would aspire to and hope to make even more open than that. So, we’ll need to figure out a way to do that.
If Google remains in stasis for much longer, an open source first philosophy would no longer be unique—it would look like Google decided to start copying Meta instead of OpenAI.
-
Topher Sanders and Dan Schwartz, ProPublica:
Every day across America, [trains] park in the middle of neighborhoods and major intersections, waiting to enter congested rail yards or for one crew to switch with another. They block crossings, sometimes for hours or days, disrupting life and endangering lives.
[…]
In Hammond [Illinois], the hulking trains of Norfolk Southern regularly force parents, kids and caretakers into an exhausting gamble: How much should they risk to get to school?
For their part, the Norfolk Southern executives sound like lovely people:
[The Hammond school district] has asked Norfolk Southern for its schedule so that the schools can plan for blockages and students can adjust their routines. The company has disregarded the requests, school officials said.
Mayor Thomas McDermott Jr. said that his experience with the rails has been similar, and that company officials have reminded him the rails “were here first,” running through Hammond before it was even a city.
Click through to the article for some of the most powerful and nerve-racking photojournalism I have ever seen.
Less than a week after publishing the above investigation, Sanders and Schwartz wrote a brief follow-up:
Within 48 hours of an investigation about children having to crawl under parked trains to get to school in an Indiana suburb, residents packed a public meeting to demand solutions, the Federal Railroad Administration issued a safety advisory, a bipartisan group of Indiana lawmakers sent a letter to the U.S. Department of Transportation pleading for change and Norfolk Southern’s CEO, Alan Shaw, got involved.
[…]
The day after the story was published, [Mayor Thomas McDermott Jr.] got a call from [Norfolk Southern’s CEO, Alan Shaw], who told him he was shocked by the situation in Hammond and wanted to help fix it. “I don’t want to divulge too much about what we talked about, but if it works out the way I hope it does, it will be spectacular,” the mayor said.
-
Inflection is a new AI startup run by Mustafa Suleyman, Karén Simonyan, and Reid Hoffman. Their new language model, Pi, isn’t designed to be the most technically advanced AI on the market. The goal is, instead, to create the most friendly and conversational bot available—“more like a sounding board than a repackaged Wikipedia answer.”
Named Pi for “personal intelligence,” Inflection’s first widely released product — made available today… is supposed to play the active listener, helping users talk through questions or problems over back-and-forth dialog it then remembers, seemingly getting to know its user over time. While it can give fact-based answers, it’s more personal than OpenAI’s GPT-4… without the virtual companionship veering into unhealthy parasocial relationships reported by some users of Replika bots.
I tried the Pi iOS app. From my brief interactions so far, the AI seems fine. My biggest critique would be that it appears to be both a bit bland and quite repetitive at the moment—one of its first responses included the phrase “are you pulling my non-existent leg?” that was also present in the linked Forbes article.
I find the overall idea of a personable LLM fascinating, though. I was immediately reminded of a recent Stratechery interview where Daniel Gross made the case for a “funny” language model:
…The sad thing to me, and actually the really alarming thing to me, is not the capability of the models or whether it’s connected to the Internet or not. To me, it’s the fact that… no one has really spent time making them sort of wonderful and fun in a Pixar way. We don’t have a John Lasseter or a Walt Disney who’s really focused on the technology but also the enjoyment of the model.
[…]
I think we’re missing… people that can really think deeply about how to make a very funny LLM. I’ve been shouting at Nat and anyone else who will listen to me that we need to find someone making a really funny language model… there are many more papers about LLMs doing math than LLMs being funny. But I think actually being funny is much more important and I would argue, broadly, a very important direction, if you think about broader AI safety risk and all that sort of stuff, it should feel like as if we’re creating the world’s best pet, not the world’s smartest actuary.
I do not think Inflection has achieved Daniel’s goal yet but I am glad someone is out there trying.
After all, we rarely view our personal relationships as a meritocracy—we value kindness, empathy, and humor over raw intelligence. Going back to technology, Apple is one of the biggest companies in the world. Is it because they sell the most cutting edge technology or deliver the greatest price-performance ratio? Absolutely not—Apple is so successful because they understand design is more important than pure technical benchmarks.
The Pi app is wonderfully well designed, now they just need to continue working on the model.
-
Researchers at MIT and Columbia University recently published a study measuring whether using an AI language model that answers questions using the Socratic method results in higher levels of critical thinking and metacognition compared to standard language model interactions.
I tested something similar after OpenAI used a “Socratic tutor” prompt to demonstrate the “steerability” of GPT-4. I found the experience to be surprisingly transformative compared to the standard ChatGPT “Q&A” workflow.
Valdemar Danry, Pat Pataranutaporn, Yaoli Mao, and Pattie Maes:
AI models can be biased, deceptive, or appear more reliable than they are, leading to dangerous decision-making outcomes… This is especially concerning when AI systems are used in conjunction with humans, as people have a tendency to blindly follow the AI decisions and stop using their own cognitive resources to think critically.
[…]
This paper presents the novel idea of AI-framed Questioning inspired by the ancient method of Socratic questioning that uses intelligently formed questions to provoke human reasoning, allowing the user to correctly discern the logical validity of the information for themselves. In contrast to causal AI-explanations that are declarative and have users passively receiving feedback from AI systems, our AI-framed Questioning method provides users with a more neutral scaffolding that leads users to actively think critically about information.
[…]
Our results show that AI-framed Questioning increase the discernment accuracy for flawed statements significantly over both control and causal AI-explanations of an always correct AI system.
Assuming the results here are accurate, what are the implications for traditional pedagogy?
-
The other day I was sleepy and, after a long day of grading, I thought I was reading an especially uninspired essay about Walter Benjamin’s “Work of Art in the Age of Mechanical Reproduction,” and I responded with a few hundred words of feedback and questions. Then it hit me. With gritted teeth, I pasted the essay into a text box, and yep, three of the algorithms that check the other algorithms delivered a 92% result.
And it was about the goddamned work of art in the goddamned age of mechanical reproduction, of all things. Perhaps I should just enjoy the beautiful irony here, but the image of myself spending my brief time on this planet thoughtfully reading and responding to the patterns of an algorithm fills me with a horror that edges toward the existential.
-
§ Caroline and I went to the local nature reserve for the first time in a few weeks—it is suddenly all so green! We placed a few wild grape vine branches in the creek to soften. The hope is that they will still be there next time we visit and they will be flexible enough to use for basket weaving.
§ We spent Saturday at the Geauga County Maple Festival. On the way home, we got some milk from a small dairy farm and eggs from a farmer down the street who has found himself in Ohio by way of San Francisco.
§ I saw a wild turkey in a neighbor’s front yard. I didn’t know we had those here.
§ Barbarian is one of the most creative horror movies I have seen recently. Sure, the villain is a bit hokey but the filmmaking, storytelling, and atmosphere more than makes up for it.
I would love to see a House of Leaves movie by the same director.
§ Caroline made a giant, 36x12 inch, stained glass window for the greenhouse. I finished the last big wall and made a door.
The whole thing is nearly complete. The last big task is to finish filling in the wall surrounding the new window. Other than that, I would like to replace the roof with a better sheet of polycarbonate and add more gravel to the floor.
§ Links
- Buckeye chicken
- The Whole Code Catalog — a catalog of “futuristic computational interfaces"
- Karl Nawrot’s fonts
- Also: Radim Peško‘s fonts
- Also: A font for knitting
- Bending Wood: what you need to know
§ Recipes
- Al pastor tacos using pork from the local group share. I used Kenji’s marinade and Mike’s cooking technique. I couldn’t find achiote, despite checking three different grocery stores. It was still good without it!
-
Matt Webb recently shared an approach to controlling smart home infrastructure with language models—a step towards his ultimate goal of creating “a new operating system for physical space”
I spent Friday night and Saturday at the London AI Hackathon… I buddied up with old colleague Campbell Orme and together we built Lares: a simulation of a smart home, with working code for an generative-AI-powered assistant.
[…]
It’s using the ReAct pattern, which is straightforward and surprisingly effective… This pattern gets the AI to respond by making statements in a Thought/Action/PAUSE/Observation loop
[…]
Generally with the ReAct pattern the tools made available to the AI allow it to query Google, or look up an article in Wikipedia, or do a calculation… For Lares we made the smart home into a tool. We said: hey here are the rooms, here are the devices, and here are their commands, do what you want.
After a certain point, especially once you give an AI agent the ability to act on your behalf—turn on and off your lights, send emails as you, lock and unlock the doors to your house…—security vulnerabilities start to become a serious concern.
In a recent blog post, Simon Willison proposed a potential solution to prompt injection attacks. He suggests filtering all user requests through a bespoke “security” LLM before sending it off to a more powerful “agent” LLM:
I think we need a pair of LLM instances that can work together: a Privileged LLM and a Quarantined LLM.
The Privileged LLM is the core of the AI assistant. It accepts input from trusted sources—primarily the user themselves—and acts on that input in various ways.
It has access to tools: if you ask it to send an email, or add things to your calendar, or perform any other potentially destructive state-changing operation it will be able to do so, using an implementation of the ReAct pattern or similar.
The Quarantined LLM is used any time we need to work with untrusted content—content that might conceivably incorporate a prompt injection attack. It does not have access to tools, and is expected to have the potential to go rogue at any moment.
It has become increasingly clear that the process of creating robust systems that incorporate language models is going to look very similar to “traditional" programming. Sure, it might be an extremely “high level” programming language but it still carries many of the existing complexities that have always been present.
-
Humane, the secretive startup founded by ex-Apple software design chief Imran Chaudri, finally went public with Chaudri showing off their device for the first time at the TED conference last week…
Chaudri’s talk is centered on the premise that technology (mainly through the smartphone) has invaded all of our lives too much. The idea is that personalized artificial intelligence can be used to dramatically change how we interact with technology. Rather than proactively opening an app to do something, AI can be an ambient thing that is there when you need it, works in the background of your life, and mostly stays out of your way.
To make this a reality, Humane is introducing a new product: a wearable that resembles a rectangular pin badge. Chaudri is wearing one on his jacket pocket during the presentation. He sets out the vision of their product as something that is “screenless, seamless and sensing”.
There is something that is just fundamentally cool about Humane’s product—it just feels like a device from the future.
The problem is Chaudri’s insistence that their device is a replacement for smartphones. John Gruber recently wrote a great piece about this:
So far, it feels like Humane’s entire premise is founded on that same mistake: building a new device intended to replace our phones, without that new device being able to do any of the dozens of things we love to do on our phones that require a display. Apple Watch and AirPods thrive because they’re satellites to our iPhones, not ostensible replacements… Anything that attempts to establish a post-phone beachhead has to do the things we love to do with our phones, or entertain us in new ways that make us forget about them. I don’t see how a laser projector on a chest badge does that.
Humane is so close to building the product I have been dreaming of. But for them to succeed, they first need to accept that, until their device is ten times better than the smartphone, it won’t supersede smartphones as the center of personal computing.
Honestly, Humane should consider scrapping the whole projector idea and focus their efforts on making an incredible app. But if Chaudri really wants to build a viable hardware project, it must be an accessory to the smartphone.
There is a sense in which any hardware project is doomed from the start, though. If Humane were to ever create a wearable that sees widespread success, Apple will undoubtably sherlock the technology and incorporate it into a “next generation” Apple Watch—I am sure they already have a similar R&D project on the back burner, just in case.
At the end of the day, I am rooting for Humane but that doesn’t mean I am optimistic.
-
Launched last week to global users after initially being a subscriber-only addition, Snapchat’s new AI chatbot powered by OpenAI’s GPT technology is now pinned to the top of the app’s Chat tab where users can ask it questions and get instant responses. But following the chatbot’s rollout to Snapchat’s wider community, Snapchat’s app has seen a spike in negative reviews amid a growing number of complaints shared on social media.
Over the past week, Snapchat’s average U.S. App Store review was 1.67, with 75% of reviews being one-star, according to data from app intelligence firm Sensor Tower.
I was optimistic about Snapchat’s My AI feature when it initially launched last month:
Snapchat has a new AI chatbot. They are, in hindsight, the perfect company to experiment with personality-driven chat. They have a younger user base, less fear of upsetting a stodgy corporate audience, and a history of being an early adopter to strange new technologies.
That was evidently an incorrect analysis. While it might be true the Snapchat company is well positioned to experiment with emerging technologies, the Snapchat user base certainly doesn’t universally appreciate being subject to these experiments.
On further reflection, I think the general principal that I wrote about a few weeks ago in regards to Google can be applied more broadly:
Generative AI is a fundamentally new technology; therefore, you should allow that to guide you into new products that were impossible or impractical previously. Attempting to shoehorn AI into existing products will be awkward, at best.
At the very least, if you are committed to the ill-advised “shoehorn” strategy, you should make these new features optional, ideally opt-in. No one appreciates it when a well-known user interface suddenly changes—no matter the reason that prompted the change.
-
Here’s the basics: there’s a new track called “Heart on My Sleeve” by a TikTok user called @ghostwriter877 with AI-generated vocals that sound like Drake and The Weeknd.
[…]
This prompted Drake and The Weeknd’s label Universal Music Group to issue a sternly-worded statement about the dangers of AI, which specifically says that using generative AI infringes its copyrights.
[…]
The first legal problem with using AI to make a song with vocals that sound like they’re from Drake is that the final product isn’t a copy of anything… Instead, UMG and Getty Images and publishers around the world are claiming that collecting all the training data for the AI is copyright infringement
The bottom line is that there is no clear precedent in place dictating the way generative AI relates to existing copyright law. This leaves a bit of a grey zone that creators, for now, are free to explore.
In the wake of the AI-generated hit Heart on My Sleeve going viral with deepfakes of multi-platinum artists Drake and The Weeknd, pop star Grimes has invited her fans to create music with her voice.
On Sunday night she tweeted, “I’ll split 50% royalties on any successful AI generated song that uses my voice. Same deal as I would with any artist i collab with. Feel free to use my voice without penalty. I have no label and no legal bindings.”
A Discord server called AI Hub hosts a large community of AI music creators behind some of the most viral AI songs. This server was created on March 25 and now has over 21,000 users.
[…]
UTOP-AI, the album created by the Discord community, features original songs using AI-generated vocals from famous artists including Travis Scott, Drake, Baby Keem, and Playboi Carti. Qo, Snoop Dogg, and twenty other people involved in the AI Hub community worked on it.
This album puts into practice what drew Qo and Dogg to AI music in the first place—the ability to create material for artists they wish to hear more of.
-
Nearly six months after the launch of ChatGPT—after Bard, Bing, Claude, LLaMA, and StabilityLM are subsequently released—one after another, large user generated content companies are closing off access to their data for training AI models.
Stack Overflow, a popular internet forum for computer programming help, plans to begin charging large AI developers as soon as the middle of this year for access to the 50 million questions and answers on its service, CEO Prashanth Chandrasekar says.
Mike Isaac, The New York Times:
[Reddit] said on Tuesday that it planned to begin charging companies for access to its application programming interface, or A.P.I.
[…]
Mr. Huffman said Reddit’s A.P.I. would still be free to developers who wanted to build applications that helped people use Reddit… But for the A.I. makers, it’s time to pay up.
Twitter CEO Elon Musk threatened to sue Microsoft on Wednesday, accusing the software giant of illegally using the social media company’s data to train its artificial intelligence model.
[…]
Musk said in December that Twitter would “pause” OpenAI’s access to its database.
It is actually unlikely that new training data from any of these companies will be necessary any time soon. Language models need a huge amount of text in order to learn basic grammar, writing styles, and general facts. Specific, up-to-date information, on the other hand, is best integrated by plugging in external tools.
After a while, though, it will be necessary to update the foundation model’s training data. When that happens, large companies that are able to either pay for API access or strike data exchange deals will unequally benefit.
-
§ It was snowing on Tuesday and then 80 °F on Thursday. That just doesn’t feel like something that should even be possible.
§ We got three new female coturnix quails. They were mailed to us at about one month old, nearly their fully grown size. I was initially nervous about the idea of getting live birds shipped to me like this but I guess it’s pretty common, according to my local post office. They were all totally fine upon arrival and quickly adjusted to their new home.
§ I’ve discovered that my neighborhood has two bubble tea shops that recently opened within three miles of each other on the same street. I am certainly not complaining, but I would not have pegged my largely Eastern European retiree suburb as such a hot boba market.
§ Links
- Niche museums
- A list of programming playgrounds
- I especially like Wokwi for Arduino
- Also: Vercel’s new LLM playground
- Ruffle is a Flash Player emulator written in Rust
- Hue.tools is the best color utility I’ve found yet—it’s one thousand times better than Colorhexa and other similar sites
§ Recipes
The Chicago restaurant I miss most has got to be The Bad Apple. I finally broke down over the weekend and tried to recreate their Even Cowgirls Get The Blues burger. I used Kenji’s burger technique and then added blue cheese, arugula, caramelized onions, and hot pepper bacon jam.
It would certainly be better for my health if this meal didn’t turn out well but nope—there is no denying how good this was. I’m going to have to make it again ASAP.
-
Benjamin Mullin and Katie Robertson, The New York Times:
BuzzFeed is shutting down its news division as part of an effort to cut 15 percent of its work force, the company’s chief executive, Jonah Peretti, said Thursday in a memo to employees.
[…]
BuzzFeed will continue to publish news on HuffPost, which Mr. Peretti said in his memo was profitable and less dependent on social platforms. He added that the company was moving forward “only with parts of the business that have demonstrated their ability to add to the company’s bottom line.”
Peretti evidently does not appreciate the fact that BuzzFeed News' true value is not reflected by the revenue it generates. BuzzFeed News gives the entire “BuzzFeed” brand a degree of legitimacy and esteem it would not otherwise have.
Before News began publishing serious journalism and winning Pulitzers, BuzzFeed was (appropriately) synonymous with low-quality listicles, quizzes, and clickbait.
The entire conceit was that BuzzFeed.com was the “junk food” that funded important investigative journalism—what is BuzzFeed’s purpose without News?
In other words, BuzzFeed is loosing an essential part of its mullet, as Josh Marshall puts it:
The journalism played an even more niche, operational role. Buzzfeed mastered the distribution element of social media very, very fast. But it had listicles and cat photos and other stuff like that. That’s tons of traffic. But it’s not the prestige play that brings you top shelf premium ad dollars. The journalism was really a loss-leader in that calculus. GM or Bacardi isn’t going to sign on to the be the exclusive sponsor of your Grumpy Cat slideshow, even if millions see it. But put a Pulitzer in the mix and it’s a very different story. There was always a big mullet aspect to these plays: prestige up front (news reporting), party in the back (listicles and memes).
-
Speaking of open source language models…
Today, Stability AI released a new open-source language model, StableLM. The Alpha version of the model is available in 3 billion and 7 billion parameters, with 15 billion to 65 billion parameter models to follow. Developers can freely inspect, use, and adapt our StableLM base models for commercial or research purposes, subject to the terms of the CC BY-SA-4.0 license.
[…]
StableLM is trained on a new experimental dataset built on The Pile, but three times larger with 1.5 trillion tokens of content. We will release details on the dataset in due course. The richness of this dataset gives StableLM surprisingly high performance in conversational and coding tasks, despite its small size
Unlike LLaMA, the base model is completely free to use commercially. The instruction tuned model, however, is only licensed for noncommercial research.
We are also releasing a set of research models that are instruction fine-tuned. Initially, these fine-tuned models will use a combination of five recent open-source datasets for conversational agents: Alpaca, GPT4All, Dolly, ShareGPT, and HH. These fine-tuned models are intended for research use only and are released under a noncommercial CC BY-NC-SA 4.0 license, in-line with Stanford’s Alpaca license.
This limitation will likely only be temporary, though, as Stability appears to be working on putting together a new instruction tuning / RLHF dataset that will presumably be permissibly licensed.
We will be kicking off our crowd-sourced RLHF program, and working with community efforts such as Open Assistant to create an open-source dataset for AI assistants.
Remember, instruction tuning is what allows your prompts to be natural and conversational. For example, you might prompt the base model with “here is a list of ten dog breeds: 1)” while you could prompt the instruction tuned model “write a list of ten dog breeds.”
Overall, this release is a huge deal if only because it creates the obvious Schelling point for future open source development work. When it was first released, Stable Diffusion was resource intensive and low quality. After a flurry of open source contributions, it quickly became the highest quality option while, at the same time, becoming efficient enough to run locally on an iPhone. If the same story occurs with StableLM, this will become a more important release than GPT-4.
-
From Bret Devereaux’s excellent series on the history and mechanics of farming:
In places where seed-drilling devices weren’t available, seeds were sown by the broadcast method. The ground was plowed, then the seeds were thrown out over the ground (literally cast broadly; this is where our term broadcast comes from); the ridges created by plowing would cause most of the seeds to fall into the grooves (called furrows; thus a ‘furrowed’ brow being one scrunched up to create ridges and depressions that looked like a plowed field), creating very rough rows of crops once those seeds sprouted. Then the land is then harrowed (where our sense of ‘harrowing‘ comes from – seriously, so much English idiomatic expressions are farming idioms, for obvious reasons), typically with rakes and hoes to bury the seeds by flattening out the ridges (but not generally entirely erasing them) in order to cover the seeds over once they had been placed with very loose clods of earth.
-
Today, we’re releasing Dolly 2.0, the first open source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.
Dolly 2.0 is a 12B parameter language model based on the EleutherAI pythia model family and fine-tuned exclusively on a new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees.
[…]
databricks-dolly-15k
contains 15,000 high-quality human-generated prompt / response pairs specifically designed for instruction tuning large language models. Under the licensing terms fordatabricks-dolly-15k
, anyone can use, modify, or extend this dataset for any purpose, including commercial applications.To the best of our knowledge, this dataset is the first open source, human-generated instruction dataset specifically designed to make large language models exhibit the magical interactivity of ChatGPT.
The release of the “databricks-dolly-15k” instruction tuning dataset under a permissive license is a much bigger deal than the trained model itself.
Language models will no doubt continue to face questions regarding training data provenance. Any and all datasets that are open, high quality, and free of copyright and ethics concerns will only improve the perceived legitimacy of future models.
RedPajama, the open source 1.2 trillion token pre-training dataset, is a big deal for the same reason.
The RedPajama base dataset is a 1.2 trillion token fully-open dataset created by following the recipe described in the LLaMA paper.
[…]
We aim to create a fully open-source reproduction of LLaMA, which would be available for commercial applications, and provide a more transparent pipeline for research.
Without a doubt, someone will soon train an open source language model on RedPajama’s base data and then apply RLHF fine-tuning using databricks-dolly-15k. This would be the first instruction-tuned language model that is fully unencumbered by copyright concerns.
-
Nico Grant, The New York Times:
A.I. competitors like the new Bing are quickly becoming the most serious threat to Google’s search business in 25 years, and in response, Google is racing to build an all-new search engine powered by the technology.
[…]
The new features, under the project name Magi, …would offer users a far more personalized experience than the company’s current service, attempting to anticipate users’ needs.
[…]
The system would learn what users want to know based on what they’re searching when they begin using it. And it would offer lists of preselected options for objects to buy, information to research and other information… Magi would keep ads in the mix of search results. Search queries that could lead to a financial transaction
[…]
Last week, Google invited some employees to test Magi’s features… Google is expected to release the tools to the public next month and add more features in the fall, according to the planning document.
I have been critical of Google’s AI strategy. Generative AI is a fundamentally new technology; therefore, you should allow that to guide you into new products that were impossible or impractical previously. Attempting to shoehorn AI into existing products will be awkward, at best.
While we don’t know many details of what Magi will ultimately look like, I am pleasantly surprised Google appears to be taking a blank-slate approach to its design and development.
I would love to see Google bring back the strategy they used with Inbox—create a playground to experiment with new ideas, unencumbered by tradition. When the time was right, Google took what they learned from Inbox and integrated it into Gmail. Maybe Magi will ultimately be merged into Google Search. Even so, Magi still would have played a valuable role as a test lab. If I am right, though, and generative AI will be most successful as a new product, Google would be well positioned for that, too.
-
§ This week was a nice sneak preview of summer. Every day was in the mid-to-high-70s and sunny. Most days Caroline and I would be outside from the time we got home from work until sunset. We got a lot of yard work done—weeding, edging, expanding the garden beds. We went through five yards of compost in two days.
§ The seeds I planted a couple of weeks ago have all sprouted—first the tomatoes and tomatillos, then peas and basil. Finally, a few days later, all of the peppers popped up.
I also started some summer squash and groundcherry seeds. I am especially excited about the later after eating them for the first time last summer.
§ Until now, you have only known Winter Blog. Summer Blog will have much more gardening. Don’t say you weren’t warned.
§ Succession episode three!
I don’t think there is anything I can say that wouldn’t be a massive spoiler but… wow—watch it.
§ Links
- Animated children’s drawings
- AgentGPT is a browser-based implementation of Auto-GPT
- A full-body keyboard
- Floor 796
- LQML is a programming language for LLM prompting
§ Recipes
- We made pizza in the Ooni more times than I care to admit.
- Earlier this week, I purchased a small kaffir lime plant which prompted me to make Kenji López-Alt’s beef with basil recipe again. Adding the lime leaves made a bigger difference than I would have expected!
-
In the 2050s, Delos Inc. operates several theme parks, including the American Old West-themed Westworld. Each environment is populated by the “Hosts”, biomechanical robots indistinguishable from humans. The Hosts are programmed to fulfill the guests' every desire… The park’s operators create narratives for these Hosts to carry out while interacting with guests
Joon Sung Park et al. at Stanford:
In this paper, we introduce generative agents—computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversation.
[…]
We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine’s Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time.
2050 seems like a pretty good prediction after all.
-
From the GitHub repository:
Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. This program, driven by GPT-4, chains together LLM “thoughts”, to autonomously achieve whatever goal you set.
The idea is that you prompt Auto-GPT with a goal—buy me the best E-bike—and then a high-level “agent” breaks down this goal into a hierarchy of tasks—research reviews, compare prices, find distributors, etc—the primary agent then delegates “sub-agents” to complete each task.
Think of it as giving GPT-4 the ability to recursively call itself.
Additionally, each agent has access to a variety of tools. For example, they can use the internet, execute code, and store information in short & long term memory.
Auto-GPT’s developer Toran Richards, in an interview with Vice:
The ability to function with minimal human input is a crucial aspect of Auto-GPT. It transforms a large language model from what is essentially an advanced auto-complete, into an independent agent capable of carrying out actions and learning from its mistakes
-
Kyle Wiggers, Devin Coldewey, and Manish Singh at TechCrunch:
AI research startup Anthropic aims to raise as much as $5 billion over the next two years to take on rival OpenAI and enter over a dozen major industries, according to company documents obtained by TechCrunch.
A pitch deck for Anthropic’s Series C fundraising round discloses these and other long-term goals for the company
[…]
“These models could begin to automate large portions of the economy,” the pitch deck reads. “We believe that companies that train the best 2025/26 models will be too far ahead for anyone to catch up in subsequent cycles.”
[…]
Dario Amodei, the former VP of research at OpenAI, launched Anthropic in 2021 as a public benefit corporation… Amodei split from OpenAI after a disagreement over the company’s direction, namely the startup’s increasingly commercial focus.
[…]
“Anthropic has been heavily focused on research for the first year and a half of its existence, but we have been convinced of the necessity of commercialization, which we fully committed to in September [2022],” the pitch deck reads.
There is something vaguely sad about Anthropic following OpenAI in adopting a commercial-first perspective. As stated in the quote above, Anthropic was initially founded as a counter response to OpenAI’s commercialization.
Anthropic does not even seem particularly adept at generating product hype—until now, I was under the impression they were intentionally trying to remain low-profile.
Despite all of this, I think it is a smart business move to make—OpenAI can’t be the only company selling access to state-of-the-art generative AI APIs—I guess I just wish it was another company that filled the void and that Anthropic was more devoted to maintaining its founding directive.
-
Google is reshuffling the reporting structure of its virtual assistant unit — called Assistant — to focus more on Bard, the company’s new artificial intelligence chat technology.
[…]
The new leadership changes suggest that the Assistant organization may be planning on integrating Bard technology into similar products in the future.
The most critical advantage Google, Amazon, and Apple have over OpenAI is that they all have existing smart assistants integrated into customer’s devices. I would love to see Google take the lead in upgrading their assistant with generative AI capabilities.
Miles Kruppa, Wall Street Journal:
Google plans to add conversational artificial-intelligence features to its flagship search engine, Chief Executive Officer Sundar Pichai said
[…]
“Will people be able to ask questions to Google and engage with LLMs in the context of search? Absolutely,” Mr. Pichai said.
[…]
Google is testing several new search products, such as versions that allow users to ask follow-up questions to their original queries, Mr. Pichai said. The company said last month that it would begin “thoughtfully integrating LLMs into search in a deeper way,” but until now hadn’t detailed plans to offer conversational features.
I don’t know… I haven’t used Bing as an “AI search engine” in at least a month. Language models—while adjacent to traditional search engines—are an entirely new technology. As time goes on, I am less convinced integrating them into existing products is the best approach.
Maybe, when it comes to search, Google should strive to make the best search engine it can. Down-rank SEO spam, improve operators, and innovate with new features. Don’t reimagine search, refine search.
To be clear, I think they should continue to develop and improve Bard—but let it be its own thing, don’t just thoughtlessly tack it onto all of your old stuff.
-
I like to think of language models like ChatGPT as a calculator for words.
This is reflected in their name: a “language model” implies that they are tools for working with language. That’s what they’ve been trained to do, and it’s language manipulation where they truly excel.
Want them to work with specific facts? Paste those into the language model as part of your original prompt!
[…]
A calculator for words is an incredibly powerful thing.
“A calculator for words” is a great analogy for language models. It is the framing the ultimately clicked for me when ChatGPT first made it clear that generative AI was going to quickly change some of our longstanding education paradigms.
From a post I wrote in December 2022:
The most exciting path forward is one where we frame Large Language Models as “a calculator for text”. Just as the invention of pocket calculators was a giant disruption that forced us to re-evaluate our approach to mathematics education, language models will continue to force us to re-evaluate our approach to research and writing. Done correctly this will open the door for us to learn more quickly, use our time more effectively, and progress further than we possibly could before.
subscribe via RSS