-
Just a few days ago, I was thinking about how great it would be if OpenAI were to integrate something similar to LangChain into ChatGPT. The idea behind LangChain and similar projects is straightforward: if ChatGPT had tools — like web search to verify factual information and a calculator or code interpreter to answer complicated arithmetic questions — many of the downsides to language models, particularly their tendency to hallucinate, would be alleviated.
Well…
We’ve implemented initial support for plugins in ChatGPT. Plugins are tools designed specifically for language models with safety as a core principle, and help ChatGPT access up-to-date information, run computations, or use third-party services.
The new feature is launching with initial support from a few large companies including Wolfram Alpha, Instacart, and Zapier. Additionally, there is documentation available for third-party developers to build their own plugins.
However, what I am most excited about right now are two of the first-party plugins OpenAI developed.
First, web browsing:
Motivated by past work (our own WebGPT, as well as GopherCite, BlenderBot2, LaMDA2 and others), allowing language models to read information from the internet strictly expands the amount of content they can discuss, going beyond the training corpus to fresh information from the present day.
This seems to have all of the capabilities of Microsoft’s Bing AI plus the ability to navigate though individual websites autonomously.
Here is OpenAI’s other plugin, a code interpreter:
We provide our models with a working Python interpreter in a sandboxed, firewalled execution environment… We would like our models to be able to use their programming skills to provide a much more natural interface to most fundamental capabilities of our computers. Having access to a very eager junior programmer working at the speed of your fingertips can make completely new workflows effortless and efficient, as well as open the benefits of programming to new audiences.
One of the best ways I have found to easily verify ChatGPT’s mathematics answers is to ask it to create a Python program that will calculate the solution for me. This has the downside of requiring additional steps on my part — copy and paste the code, execute the Python program on my computer, compare the results. I am particularly excited to try the new interpreter plugin for exactly this reason.
Finally, the obvious next step that I would love to see is a meta-layer that is aware of all of the available plugins and, for each individual query, automatically chooses the plugin best suited for the task. At the speed all of these AI developments are moving we should have that ability in, what, a month?
-
Zeyi Yang, MIT Technology Review:
On March 16, Robin Li, Baidu’s cofounder and CEO, took the stage in Beijing to showcase the company’s new large language model, Ernie Bot.
Accompanied by art created by Baidu’s image-making AI, he showed examples of what the chatbot can do, including solve math questions, write marketing copy, answer questions about Chinese literature, and generate multimedia responses.
[…]
The highlight of the product release was Ernie Bot’s multimodal output feature, which ChatGPT and GPT-4 do not offer… Li showed a recorded interaction with the bot where it generated an illustration of a futuristic city transportation system, used Chinese dialect to read out a text answer, and edited and subtitled a video based on the same text. However, in later testing after the launch, a Chinese publication failed to reproduce the video generation.
If Baidu’s presentation is accurate, Ernie’s multimodal features are genuinely impressive. While the image generation abilities do not seem any more advanced than DALL-E, the audio and video generation features are honestly striking.
Meanwhile… Che Pan at SCMP:
Fang Bingxing, considered the father of China’s Great Firewall… said the rise of generative AI tools like ChatGPT… pose a big challenge to governments around the world, according to an interview published on Thursday… “People’s perspectives can be manipulated as they seek all kinds of answers from AI,” he was quoted as saying.
[…]
Many expected that China’s heavily-censored internet would be a challenge for Chinese tech companies in developing a ChatGPT-like service because it is hard to predict and control answers.
China’s powerful internet regulators have told Chinese tech companies not to offer ChatGPT access to the public, and they need to inform the authorities before launching their own ChatGPT-like services, according to a report by Nikkei Asia in February
-
I don’t typically think of Bill Gates as someone prone to making hyperbolic claims. His recent assertion that “artificial intelligence is as revolutionary as mobile phones and the Internet” is all the more arresting for that very reason.
In my lifetime, I’ve seen two demonstrations of technology that struck me as revolutionary.
The first time was in 1980, when I was introduced to a graphical user interface.
The second big surprise came just last year. I’d been meeting with the team from OpenAI since 2016 and was impressed by their steady progress. In mid-2022, I was so excited about their work that I gave them a challenge: train an artificial intelligence to pass an Advanced Placement biology exam. Make it capable of answering questions that it hasn’t been specifically trained for… If you can do that, I said, then you’ll have made a true breakthrough.
In September, when I met with them again, I watched in awe as they asked GPT, their AI model, 60 multiple-choice questions from the AP Bio exam—and it got 59 of them right. Then it wrote outstanding answers to six open-ended questions from the exam. We had an outside expert score the test, and GPT got a 5—the highest possible score…
I knew I had just seen the most important advance in technology since the graphical user interface.
-
There has been something bouncing around in my head in the days since both Google and Microsoft announced new AI features for their productivity applications. I felt significantly more negatively about Google’s framing of the features than Microsoft’s. I did not understand why — they are effectively the same announcements, right? Both companies are adding generative AI to their writing, slideshow, and spreadsheet apps — why should I feel differently about either of them? Then, I read both of their press releases again…
This is how Google describes an intended use case for their new AI features:
In Gmail and Google Docs, you can simply type in a topic you’d like to write about, and a draft will be instantly generated for you. So if you’re a manager onboarding a new employee, Workspace saves you the time and effort involved in writing that first welcome email.
In contrast, here is Microsoft. The AI is closer to a creative partner than anything else:
Copilot gives you a first draft to edit and iterate on — saving hours in writing, sourcing, and editing time. Sometimes Copilot will be right, other times usefully wrong — but it will always put you further ahead. You’re always in control as the author, driving your unique ideas forward, prompting Copilot to shorten, rewrite or give feedback.
On Stratechery, Ben Thompson finds a similar distinction:
In Google’s view, computers help you get things done — and save you time — by doing things for you.
[…]
All of [Microsoft’s] demos throughout the presentation reinforced this point: the copilots were there to help, not to do — even if they were in fact doing a whole bunch of the work. Still, I think the framing was effective: it made it very clear why these copilots would be beneficial, demonstrated that Microsoft’s implementation would be additive not distracting, and, critically, gave Microsoft an opening to emphasize the necessity of reviewing and editing. In fact, one of the most clever demos was Microsoft showing the AI making a mistake and the person doing the demo catching and fixing the mistake while reviewing the work.
To Microsoft, AI should help. To Google, AI should do.
A genuine case could be made for both approaches. I know which one I prefer, though.
-
Me:
How might education change as a result of increasing artificial intelligence capabilities? In particular, large language models that are capable of passing standardized tests.
GPT-4:
The improvement of artificial intelligence capabilities, particularly large language models that can pass standardized tests, is likely to bring significant changes to the education landscape. Some possible ways education might change include:
-
Personalized learning: AI-driven platforms could tailor educational content to each student’s individual needs, learning style, and pace, allowing for a more customized and efficient learning experience. This could help address the diverse needs of students and ensure that no one is left behind.
-
Enhanced teaching resources: AI can assist teachers in creating and curating lesson plans, assessments, and other educational materials. By providing access to a vast array of resources and suggestions, AI could help teachers stay up-to-date with the latest teaching methods and content.
-
Continuous assessment: AI systems can monitor students' progress in real-time, allowing for early identification of learning gaps and targeted interventions. This can lead to improved student outcomes and a more efficient use of instructional time.
-
Automated grading and feedback: AI can evaluate students' work and provide instant feedback, allowing teachers to focus on other aspects of their job, such as addressing individual student needs or designing more engaging learning experiences.
-
New teaching methods: The rise of AI might lead to innovative teaching methods and strategies that combine human expertise with machine-driven insights, resulting in more effective and engaging learning experiences.
-
Virtual tutors and mentors: AI-powered virtual tutors can provide on-demand support to students, helping them with homework, answering questions, and offering guidance. These virtual tutors could be available 24/7, ensuring that students can access help whenever they need it.
-
Shift in curriculum focus: As AI becomes increasingly capable of performing routine tasks, the focus of education may shift towards nurturing soft skills, creativity, critical thinking, and problem-solving abilities that are difficult for machines to replicate.
-
Lifelong learning: AI-driven learning platforms can help individuals continuously update their skills and knowledge throughout their lives, promoting a culture of lifelong learning.
-
Enhanced accessibility: AI technologies could help make education more accessible to students with disabilities, by offering personalized accommodations and support.
-
Redefining the role of teachers: With AI taking on some of the more administrative and repetitive tasks in education, teachers may have more time to focus on developing strong relationships with their students and serving as mentors and guides.
However, it is important to consider the potential challenges and ethical concerns that might arise with the increased use of AI in education, such as data privacy, potential biases in AI algorithms, and the risk of over-reliance on technology. It is crucial for stakeholders in education to engage in ongoing discussions to ensure that AI is used responsibly and effectively to enhance the learning experience for all students.
#7 — “shift in curriculum focus” — is the type of change I would like to see in response to these technological developments. #10 — “redefining the role of teachers” — feels like a necessary extension of that.
#6 — “virtual tutors and mentors” — seems like the predominate focus right now, judging from the many articles and product announcements in the last week alone. I think it is productive to explore this as a small aspect of education — the part of education where rote memorization and rigid lesson structure is unavoidable. As we move into a more technologically advanced future, an ever-increasing percentage of learning should instead center on hands-on creative problem solving, project-based learning, and social-emotional development.
Me:
What are a few good and a few bad ways educators could respond to these changes?
GPT-4:
Educators play a crucial role in navigating the changes brought about by the increasing use of AI in education. Here are a few good and bad ways educators could respond to these changes:
Good ways:
-
Embrace technology : Educators should be open to adopting new AI-driven tools and technologies to enhance the learning experience. This includes staying up-to-date with the latest developments, exploring different tools, and integrating them into the classroom in a thoughtful manner.
-
Focus on soft skills and critical thinking: With AI handling more routine tasks, educators can focus on fostering creativity, critical thinking, collaboration, communication, and other soft skills that are essential for success in the 21st century.
-
Engage in continuous professional development: Educators should seek opportunities for professional development to learn about AI technologies and how they can be effectively integrated into the curriculum. This includes attending workshops, conferences, and online courses.
-
Collaborate with stakeholders: To ensure the responsible and ethical use of AI in education, educators should collaborate with administrators, policymakers, parents, and students to establish guidelines, share best practices, and address concerns.
-
Advocate for equitable access: Educators should work to ensure that all students, regardless of socio-economic background, have access to AI-driven educational resources and tools.
Bad ways:
-
Resistance to change: A negative response to AI-driven changes would be for educators to resist or dismiss new technologies, potentially hindering student learning and limiting the benefits of AI integration in the classroom.
-
Over-reliance on technology: While AI can be a valuable educational tool, it should not replace the role of human educators. Teachers who rely too heavily on AI-driven tools and neglect the importance of personal connections and mentorship may fail to provide a well-rounded education.
-
Ignoring potential biases and ethical concerns: Educators who do not pay attention to the ethical implications of AI use in education, such as biases in algorithms and data privacy concerns, may inadvertently contribute to perpetuating inequalities and harm students.
-
Neglecting digital literacy: Failing to teach students about digital literacy, including the responsible use of AI-driven tools and the potential risks involved, may leave students ill-equipped to navigate the digital landscape.
-
Not addressing accessibility concerns: Ignoring the needs of students with disabilities or those from disadvantaged backgrounds when implementing AI-driven tools may exacerbate existing educational inequalities.
#2 — “over-reliance on technology” — and #4 — “neglecting digital literacy” — from the “bad responses” list both bring up great points. Teachers should educate themselves and their students on the situations where computers (automation, AI, etc.) are the “right tool for the job.” When computation is the best tool for a given problem, students should know how to effectively and efficiently use it. When it is not the best tool, though, students should have the broad array of literacies and confidence necessary to approach the problem in whichever manner is best.
Previously:
-
-
§ Spring break! With a full two weeks off, I should put together a list of things I would like to work on during this time. Let’s see…
- General spring cleaning
- Get seeds started in peat pots
- Make some progress on the greenhouse build
Hmm… that is not too bad, actually. I will check back in on how it went in a couple of weeks.
Speaking of spring break, after a super mild Winter, we finally got some snow this week… ugh.
§ To help wrap my mind around the possibilities enabled by GPT-4’s multimodality, I got a copy of the graphic novel Unflattening by Nick Sousanis. I have not had a chance to read much of it yet but what I have read so far has been interesting. I hope to have more time to dive into it next week. Watch this space.
§ A guilty pleasure of mine has alway been watching “wilderness survival” television shows. Alone is the best example of this genre that I have seen in a long time.
The general premise is this: ten participants are dropped off alone (get it?) somewhere in the Canadian wilderness. They are only allowed to bring a few basic items — a tarp, fire starter, hand saw, etc. Whoever is able to stay out for the longest amount of time wins $500,000.
What makes the show good is that isn’t overly dramatized. It is more like a personal diary of each participant’s day-to-day experience as they build a shelter, forage for food, and explore the environment around them.
§ My Google Home pronounced the opera Gianni Schicchi like it was “Jonny Squishy.”
§ Links
- An open source web interface for the ChatGPT API
- Harry Whittier Frees
- Emergent abilities of large language models
- “An amusing toy, nothing more”
- Alpaca.cpp
- Jeff Kaufman on bushels
§ Recipes
- Rosemary beer battered fish & chips
- Delicious but quite a bit of work. It reminded me how infrequently I cook fish, though. I hope to incorporate some more fishy dishes into my usual repertoire now.
-
To cap off a week of AI announcements from OpenAI, Anthropic, and Google, Microsoft announced Copilot for their 365 productivity suite yesterday.
Today, we are bringing the power of next-generation AI to work. Introducing Microsoft 365 Copilot — your copilot for work. It combines the power of large language models (LLMs) with your data in the Microsoft Graph and the Microsoft 365 apps to turn your words into the most powerful productivity tool on the planet.
[…]
Copilot is integrated into Microsoft 365 in two ways. It works alongside you, embedded in the Microsoft 365 apps you use every day — Word, Excel, PowerPoint, Outlook, Teams and more — to unleash creativity, unlock productivity and uplevel skills. Today we’re also announcing an entirely new experience: Business Chat. Business Chat works across the LLM, the Microsoft 365 apps, and your data — your calendar, emails, chats, documents, meetings and contacts — to do things you’ve never been able to do before. You can give it natural language prompts like “Tell my team how we updated the product strategy,” and it will generate a status update based on the morning’s meetings, emails and chat threads.
[…]
AI-powered LLMs are trained on a large but limited corpus of data. The key to unlocking productivity in business lies in connecting LLMs to your business data — in a secure, compliant, privacy-preserving way. Microsoft 365 Copilot has real-time access to both your content and context in the Microsoft Graph. This means it generates answers anchored in your business content — your documents, emails, calendar, chats, meetings, contacts and other business data — and combines them with your working context — the meeting you’re in now, the email exchanges you’ve had on a topic, the chat conversations you had last week — to deliver accurate, relevant, contextual responses.
This entire announcement presents an incredibly corporate version of the AI integration I hope to see from Apple someday.
My dream is to ask Siri, “What was I doing last Saturday?” and receive an accurate summary based on all the data from my devices – including calendar events, geolocation, photos, web browsing history, and more. Siri should function as a continuously fine-tuned personal assistant with the ability to answer queries and generate content in a freeform manner. However, this all poses significant privacy concerns. For that reason, it would be crucial that all aspects – training, inference, and storage – occur exclusively on-device. This would really make all of Apple’s Neural Engine development look prescient.
-
It does not seem like an ideal strategy for Anthropic to publish their big Claude announcement on the same day GPT-4 was released. That is exactly what happened, though, so Claude got a bit buried under the excitement.
After working for the past few months with key partners like Notion, Quora, and DuckDuckGo in a closed alpha, we’ve been able to carefully test out our systems in the wild, and are ready to offer Claude more broadly so it can power crucial, cutting-edge use cases at scale.
Claude is a next-generation AI assistant based on Anthropic’s research into training helpful, honest, and harmless AI systems. Accessible through chat interface and API in our developer console, Claude is capable of a wide variety of conversational and text processing tasks while maintaining a high degree of reliability and predictability.
[…]
We’re offering two versions of Claude today: Claude and Claude Instant. Claude is a state-of-the-art high-performance model, while Claude Instant is a lighter, less expensive, and much faster option.
From what I have been able to see through the Poe app, Claude is Good. And, I am thankful there is a serious competitor to OpenAI. At the end of the day, though, I am not sure Anthropic is the alternative to OpenAI that the world needs. We need a serious open source project, not another proprietary API.
Update:
I ran some comparisons between Claude, Bard, and GPT-4. You can read the results here.
-
As was widely rumored, OpenAI officially announced GPT-4 yesterday.
We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%.
[…]
We are releasing GPT-4’s text input capability via ChatGPT and the API (with a waitlist).
Language improvements:
In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5.
[…]
While still a real issue, GPT-4 significantly reduces hallucinations relative to previous models (which have themselves been improving with each iteration). GPT-4 scores 40% higher than our latest GPT-3.5 on our internal adversarial factuality evaluations
Visual inputs:
GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images. Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs.
I have to admit, I am disappointed GPT-4 can not output images and can only accept them as input. I wouldn’t be surprised if this changes before too long, though. Regardless, this is a huge new feature. I am going to have a lot of fun thinking of projects I can try this with as I wait for API access.
Miscellaneous:
GPT-4 generally lacks knowledge of events that have occurred after the vast majority of its data cuts off (September 2021), and does not learn from its experience.
This is disappointing and leads credence to those that say OpenAI is having difficulty filtering AI generated text out of potential training material.
gpt-4 has a context length of 8,192 tokens. We are also providing limited access to our 32,768–context (about 50 pages of text) version, gpt-4-32k
As general prose output improves, the next avenue for major language model development will be increasing context length. The 32,000 token model is especially exciting for that reason.
Rather than the classic ChatGPT personality with a fixed verbosity, tone, and style, developers (and soon ChatGPT users) can now prescribe their AI’s style and task by describing those directions in the “system” message. System messages allow API users to significantly customize their users’ experience within bounds.
I have been having a lot of fun experimenting with altering the “system” message through the GPT-3.5 API. It is great that they will be bringing that capability to the ChatGPT web interface.
ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a usage cap. We will adjust the exact usage cap depending on demand and system performance in practice, but we expect to be severely capacity constrained… To get access to the GPT-4 API (which uses the same ChatCompletions API as gpt-3.5-turbo), please sign up for our waitlist. We will start inviting some developers today, and scale up gradually to balance capacity with demand… Pricing is $0.03 per 1k prompt tokens and $0.06 per 1k completion tokens.
My prediction was correct. GPT-4 is available today to ChatGPT Plus subscribers, everyone else must sign up on the waitlist. Additionally, the API will cost much more than the gpt-3.5 API.
Okay, one more thing: Microsoft confirmed that Bing AI has been using GPT-4 under the hood since launch.
Google should be embarrassed.
In a desperate attempt to insert themselves into the conversation, a few hours before OpenAI’s announcement, Google announced that “select developers” will be invited to try their PaLM language model API with a waitlist coming “soon.”
It would be fair to say that I am more than a bit suspicious of Google’s recent AI efforts. They have been silent about Bard since its announcement more than a month ago where they said the chatbot would be widely available to the public “in the coming weeks.”
Google’s messaging around AI does not sound like it is coming from a company that is excited about building public-facing generative AI technology. More than anything else, their announcement today felt defensive — as if Google is concerned the public will forget they have historically been a leader in AI research. If they stay on their current path, that is exactly what will happen.
-
Speaking of Sydney’s Law…
The biggest weakness in the LLaMA models released by Meta research last month is their lack of instruction-tuning. A language model is a sentence completion engine. You give it a sequence of words, “The first man on the moon was”, and it completes that sentence, hopefully with useful content.
One of the great innovations from OpenAI was their application of instruction tuning to GPT-3… Prior to this, you had to think very carefully about how to construct your prompts. Thanks to instruction tuning you can be a lot more, well, human in the way you interact with the model. “Write me a poem about pandas!” now works as a prompt, instead of “Here is a poem about pandas:”.
We are releasing our findings about an instruction-following language model, dubbed Alpaca, which is fine-tuned from Meta’s LLaMA 7B model… Alpaca shows many behaviors similar to OpenAI’s text-davinci-003, but is also surprisingly small and easy/cheap to reproduce.
We are releasing our training recipe and data, and intend to release the model weights in the future. We are also hosting an interactive demo to enable the research community to better understand the behavior of Alpaca… We emphasize that Alpaca is intended only for academic research and any commercial use is prohibited.
[…]
We performed a blind pairwise comparison between text-davinci-003 and Alpaca 7B, and we found that these two models have very similar performance: Alpaca wins 90 versus 89 comparisons against text-davinci-003.
-
It has become clear that we need something like the inverse of Hofstadter’s law to describe the rate of AI progress.
February 24th: Meta announces their LLaMA language model and releases it to academic researchers
March 3rd: The LLaMA model weights leak publicly on 4Chan
March 10th: Georgi Gerganov releases a heavily optimized port of LLaMA, allowing it to run locally on consumer Apple hardware
March 12th: Artem Andreenko successfully runs LLaMA on a Raspberry Pi
So, without further ado:
Sydney’s Law: AI progress is faster than you expect, even when you take into account Sydney’s Law.
-
§ Week ten! This upcoming week will be the last one before a nice, two week long, spring break.
I narrowly avoided catching Norovirus despite seeing a catastrophic number of cases at my school. Now we just need to see if I can make it all the way through this next week unscathed.
§ I started teaching my fifth grade classes with the Circuit Playgrounds. It is easy to forget how exciting some of the “basic” elements are when they are new to you. For now, at least, most of my students are endlessly fascinated by lighting up the RGB LEDs into animated sequences.
I have also been having a lot of fun working on a hand-drawn activity booklet for the Playgrounds. It is certainly still an early draft but feel free to check it out!
§ GPT-4 might be announced next week? The linked article alleges the defining new feature will be multimodality — the ability to generate images and videos in addition to text. Very exciting, if true.
My prediction is that GPT-4 will launch as a “ChatGPT Pro” exclusive feature. It would explain why the pro plan is so much more expensive than “gpt-3.5-turbo” API access despite them both currently interfacing with the same underlying model.
§ Kagi announced some updates relevant to what I wrote about them last week:
- They are raising their prices beginning March 15th. The new unlimited plan will be $25/month, up from $10.
- Their generative AI features will launch next week as well, with the ability to summarize a list of search results, summarize a single page, or “chat” about a page. These summarization features might just be a frontend for the OpenAI API, though.
I am still excited to try the new features but I am very unlikely to renew my subscription at the new prices.
§ I finished The Last of Us Part II. The game does have an ending after all. I tried replaying it but quickly stalled out. I guess that is the downside to such a story-heavy game.
§ Links
- A reverse Turing test
- Microsoft’s Visual ChatGPT (arXiv)
- Abusing Snap Circuits
- A LLaMA fork optimized for Apple Silicon
§ Recipes
- Pav Bhaji
- I will admit I was nervous about this one—boiling and mashing isn’t typically my preferred way to prepare vegetables. This turned out great, though, and will definitely become a regular part of my rotation
-
We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. Input to our embodied language model are multi-modal sentences that interleave visual, continuous state estimation, and textual input encodings.
[…]
Our evaluations show that PaLM-E, a single large embodied multimodal model, can address a variety of embodied reasoning tasks, from a variety of observation modalities, on multiple embodiments, and further, exhibits positive transfer: the model benefits from diverse joint training across internet-scale language, vision, and visual-language domains.
Benji Edwards at Ars Technica:
Since it’s based on a language model, PaLM-E takes continuous observations, like images or sensor data, and encodes them into a sequence of vectors that are the same size as language tokens. This allows the model to “understand” the sensory information in the same way it processes language.
[…]
Google Robotics isn’t the only research group working on robotic control with neural networks. This particular work resembles Microsoft’s recent “ChatGPT for Robotics” paper, which experimented with combining visual data and large language models for robotic control in a similar way.
-
At Microsoft, whose chief executive Satya Nadella declared in 2016 that “bots are the new apps”, it is now acknowledged that voice assistants, including its own Cortana, did not live up to the hype.
“They were all dumb as a rock,” Nadella told the Financial Times last month. “Whether it’s Cortana or Alexa or Google Assistant or Siri, all these just don’t work. We had a product that was supposed to be the new front-end to a lot of [information] that didn’t work.”
[…]
ChatGPT’s ability to understand complex instructions left existing voice assistants looking comparatively stupid, said Adam Cheyer, the co-creator of Siri
[…]
A voice assistant revival may come from generative AI, which could help make them much smarter than they are today.
“It’s causing buzz,” said the current Amazon employee about tools such as ChatGPT. “There was a directive that came down from some [executives] to have teams brainstorm what it would look like for Alexa to be more intelligent.”
It is great to see more acknowledgment of something that has frustrated me since the introduction of ChatGPT late last year.
Thankfully, the introduction of the new ChatGPT API means that it is now possible another company might get a jump start on making a truly intelligent voice assistant a reality.
-
I said that my “Faust parameter” — the probability I’d accept of existential catastrophe in exchange for learning the answers to humanity’s greatest questions — might be as high as 0.02. Though I never actually said as much, some people interpreted this to mean that I estimated the probability of AI causing an existential catastrophe at somewhere around 2%.
[…]
Even before AI, I assigned a way higher than 2% probability to existential catastrophe in the coming century—caused by nuclear war or runaway climate change or collapse of the world’s ecosystems or whatever else. This probability has certainly not gone down with the rise of AI, and the increased uncertainty and volatility it might cause. Furthermore, if an existential catastrophe does happen, I expect AI to be causally involved in some way or other, simply because from this decade onward, I expect AI to be woven into everything that happens in human civilization. But I don’t expect AI to be the only cause worth talking about.
[…]
So, if we ask the directly relevant question — do I expect the generative AI race, which started in earnest around 2016 or 2017 with the founding of OpenAI, to play a central causal role in the extinction of humanity? — I’ll give a probability of around 2% for that. And I’ll give a similar probability, maybe even a higher one, for the generative AI race to play a central causal role in the saving of humanity. All considered, then, I come down in favor right now of proceeding with AI research … with extreme caution, but proceeding.
-
Christophe Haubursin and Gina Barton at Vox:
For decades, US playground designers have worked on minimizing risk by reducing heights, softening surfaces, and limiting loose parts. But now, some are starting to experiment with creating risk: A growing body of research has found that risky outdoor play is a key part of children’s health, promoting social interactions, creativity, problem-solving, and resilience.
The linked article above is really just a super short introduction to their video about “adventure playgrounds.” In the video, there is a very clarifying distinction made between risks and hazards:
A risk isn’t the same thing as a hazard. When you’re climbing a tall tree, a rotten branch is a hazard: the threat is unexpected. But how high you climb is a risk: it’s manageable, and requires you to actively make a decision.
I love this distinction. It eloquently describes something that I have struggled to articulate myself for quite some time. My goal as a teacher has always been to create environments that minimize hazards so that my students feel safe enough to take risks.
Later in the video, there is another point made that totally resonates with me:
The idea behind all these design elements is that kids respond well to being treated seriously: if they’re presented with risky items with a serious functional purpose, they’ll respond cautiously and conduct more experimentation. But if presented with an overly safe, static space, they often wind up seeking dangerous thrills that the built environment fails to provide, which can result in higher injury rates than risky play at adventure playgrounds.
-
Hey, so, remember when I mentioned LLaMA, Meta’s ChatGPT alternative? I thought it was exciting for two reasons:
- It requires less computing power for inference than similarly powerful models
- It is open source, at least in the sense that academic researchers have the ability to submit an application requesting access to the model.
Well, less than a week after it was released someone leaked the model weights online allowing anyone to download and run the model without pre-approval from Microsoft. Here is a Hugging Face Space where you can try out the smaller, 7 billion parameter LLaMA variant.
I am of two minds about this. First, I think this has the chance to kick off a “stable diffusion moment” for large language models. To that end, I am already seeing projects that tout enormous performance improvements. The story of 2022 onward will be that the open source community can contribute engineering developments to generative AI at a breathtaking speed when they are given the opportunity. This is certainly already the case with image generation and I think it is inevitable that this will also occur for text. Whether or not LLaMA is the basis for this is, to some extent, up to Meta now.
On the other end of the spectrum, this leak might have the consequence of AI development becoming less open. If large companies feel as though they can not safely share their results with select researchers, all of this work might remain where it is today: either locked inside of Google or accessible only through a paid API from OpenAI. And that is not the future I would like to see.
-
§ I have been testing out the Kagi search engine this week, using it exclusively instead of Google. A few thoughts:
- It is expensive at $10/month. It will have to become either significantly better or significantly differentiated from Google for me to continue paying for much longer.
- Despite what I said above, the results are surprisingly good. On par or slightly better that equivalent Google results. I was expecting it to fall down on local results but no, those were totally fine too.
- I did not realize how frequently I use search engines until I started paying attention. Around 50 searches, give or take a handful, on a typical day.
- Kagi has fewer built-in widgets than Google and the ones it does have are less polished. One of the few times I went to Google was to convert milliliters to fluid ounces. It is great to be able to do these simple conversions in an interactive widget instead of a janky, ad-covered webpage.
The most exciting aspect of Kagi, to me, is that it is being actively developed, in the open, and taking direct feedback from its customers. And they are trying new things! Of particular interest to me are their upcoming AI webpage summarizer and AI search features. It will be interesting to see where Kagi is this time next year.
§ Speaking of search engines, I finally got access to Bing Chat.
- It is really nice to have a decent mobile interface for AI chat.
- It is slow, much slower than ChatGPT, but if that is the price to pay for its real-time web retrieval capabilities then it is worth it. In practice, built-in web search with citations is a more important feature than I anticipated. It goes a long way to give me confidence that answers aren’t hallucinated.
- The automatic response suggestion bubbles are occasionally convenient but also oddly unsettling. It is almost as if I am just a conduit for the AI to talk to itself.
- I was close to saying that Bing Chat is, more or less, a better version ChatGPT but that isn’t quite right. There are certain tasks, like editing and revising an email, where Bing Chat responded with generic email drafting tips while ChatGPT accomplished the actual task at hand.
- Remember that “milliliters to fluid ounces” conversion I tried on Kagi? Well, later on I realized that Bing Chat might be a better interface for these types of queries than any conventional search engine, and indeed, it totally worked and was the most convenient method overall.
§ The Last of Us Part II is a long video game. There have been three or four different occasions where I thought I was certain the game was about to end only to realize it had just barely begun. It would have been a short, but totally satisfying, game if it had ended after reaching the TV station in Seattle, there has been at least ten hours of gameplay sense then. I mentioned last week that I was enjoying Part II more than Part I, that definitely has not changed. I will be sad when I finally finish it, whenever that ends up happening.
§ Links
- Dan Shiffman’s The Nature of Code book rewritten for P5.js
- AI generated knitting patterns
- The camera-shy hoodie
- Critical Topics: AI Images class syllabus
§ Recipes
We have finally reached a week where I didn’t cook any new recipes. I tried to keep things pretty simple: I put together a couple of simple pasta dishes, got some takeout, and cooked that tikka masala dish again.
-
Rohit Krishnan, writing at Strange Loop Canon:
I have an overarching theory of LLMs… they are fuzzy processors.
[…]
Fuzzy processors are different in the sense that they are not deterministic. The answers you get to prompts can be perfect encapsulations, outright lies, summaries with 20% missing, or just outright hallucinations…
This, however, is possibly just fine. Whenever I write a piece of code I have to spend roughly 3x as long Googling and 4x as long troubleshooting. That’s also an issue of the output not matching what I want from an output.
But … but the first fuzzy processor makes different mistakes than what we’re used to. It makes, dare I say, more human mistakes. Mistakes of imagination, mistakes of belief, mistakes of understanding.
To use it is to learn a new language… its closer to sculpting than just searching.
-
Sol LeWitt was an artist who worked across many mediums but is probably most famous for his Wall Drawings series. These artworks aren’t actually drawings at all but, rather, instructions for drawings.
The instructions have been implemented in many ways, by many different people, revealing how they are at once prescriptive and ambiguous. Control over a final output lies somewhere between the instruction giver and the instruction follower. The idea popped into my head to put these instructions into ChatGPT, and ask for p5js sketches.
These are a lot of fun. I wonder if this general concept would be a good way to begin teaching non-programmers creative code — describe your intended result, check the generated code, tweak it by hand, repeat. You’ll get to learn some coding concepts out of the iterative process while avoiding the paralysis of a blank IDE.
-
Snapchat has a new AI chatbot. They are, in hindsight, the perfect company to experiment with personality-driven chat. They have a younger user base, less fear of upsetting a stodgy corporate audience, and a history of being an early adopter to strange new technologies.
Today we’re launching My AI, a new chatbot running the latest version of OpenAI’s GPT technology that we’ve customized for Snapchat. My AI is available as an experimental feature for Snapchat+ subscribers, rolling out this week.
At launch, My AI is essentially just a fast mobile-friendly version of ChatGPT inside Snapchat. The main difference is that Snap’s version is more restricted in what it can answer. Snap’s employees have trained it to adhere to the company’s trust and safety guidelines and not give responses that include swearing, violence, sexually explicit content, or opinions about dicey topics like politics.
It has also been stripped of functionality that has already gotten ChatGPT banned in some schools; I tried getting it to write academic essays about various topics, for example, and it politely declined.
[…]
While ChatGPT has quickly become a productivity tool, Snap’s implementation treats generative AI more like a persona… My AI is meant to be another friend inside of Snapchat for you to hang out with, not a search engine.
Also, I am not sure I fully appreciated the fact that OpenAI not only incited a gold rush with the release of ChatGPT but they also positioned themselves to be the premier shovel seller.
Snap is one of the first clients of OpenAI’s new enterprise tier called Foundry, which lets companies run its latest GPT-3.5 model with dedicated compute designed for large workloads.
-
Nicholas Guttenberg at Crosslabs:
[Stable Diffusion] has an interesting limitation that seems to have mostly gone unnoticed. If you try to ask it to generate images that should be particularly dark or light, it almost always generates images whose average value is relatively close to 0.5 (with an entirely black image being 0, and an entirely white image being 1).
The rest of the post is fascinating but quite technical and hard to excerpt. In the end, the results he was able to achieve are striking.
It is remarkable to me how quickly and completely Stable Diffusion was able to take over closed source, commercial alternatives. In my circles, at least, I almost never hear anyone talk about DALL-E 2 anymore. Maybe, in hindsight, it was inevitable that any individual company just couldn’t match the velocity of the entire open source community. It is a success story that I hope to see reproduced with language models sometime soon.
-
Ezra Klein at The New York Times:
The question at the core of the [Kevin] Roose/Sydney chat is: Who did Bing serve? We assume it should be aligned to the interests of its owner and master, Microsoft. It’s supposed to be a good chatbot that politely answers questions and makes Microsoft piles of money. But it was in conversation with Kevin Roose. And Roose was trying to get the system to say something interesting so he’d have a good story. It did that, and then some. That embarrassed Microsoft. Bad Bing! But perhaps — good Sydney?
That won’t last long. Microsoft — and Google and Meta and everyone else rushing these systems to market — hold the keys to the code. They will, eventually, patch the system so it serves their interests. Sydney giving Roose exactly what he asked for was a bug that will soon be fixed.
We are talking so much about the technology of A.I. that we are largely ignoring the business models that will power it… The age of free, fun demos will end, as it always does. Then, this technology will become what it needs to become to make money for the companies behind it
I have said a few times now that fun, personalized, AI assistants will necessarily need to be under the control of each individual user to be successful. That might be a bit overly optimistic, but not at all outside of the realm of possibilities — just look at the optimizations Apple made to its Neural Engine specifically for running Stable Diffusion on-device.
The AI team at Meta recently released a new large language model architected to be lightweight and possible to run on single-GPU consumer hardware.
As part of Meta’s commitment to open science, today we are publicly releasing LLaMA (Large Language Model Meta AI)
[…]
Smaller, more performant models such as LLaMA enable others in the research community who don’t have access to large amounts of infrastructure to study these models, further democratizing access in this important, fast-changing field.
[…]
To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license focused on research use cases.
Despite its small size, the team at Meta says LLaMA’s performance is on par with current state-of-the-art LLMs.
From the associated research paper:
The focus of this work is to train a series of language models that achieve the best possible performance at various inference budgets, by training on more tokens than what is typically used.
The resulting models, called LLaMA, ranges from 7B to 65B parameters with competitive performance compared to the best existing LLMs. For instance, LLaMA-13B outperforms GPT-3 on most benchmarks, despite being 10× smaller. We believe that this model will help democratize the access and study of LLMs, since it can be run on a single GPU.
At the higher-end of the scale, our 65B-parameter model is also competitive with the best large language models such as Chinchilla or PaLM-540B.
As I see it, there are three distinct future scenarios for LLMs:
The data harvesting, advertising driven, assistants that Ezra Klein describes are clearly a bad idea to me. Mixing highly-persuasive, personalized chatbots with advertising incentives will result in bots that feel like pushy, manipulative salespeople, not helpful digital assistants.
Very expensive, centralized, subscription funded assistants seems like an acceptable, albeit costly option. Though, this does not solve the issue of companies hampering abilities due to (understandable) PR concerns. Given our current trajectory, this looks like the most likely path. The $20/month ChatGPT Pro subscription service is an early example of what this might look like. When these products mature, I would expect the price to at least double.
On-device, individualized, assistants would be the most trustworthy. If inference computation happens on-device, the cost to parent companies would be minimal and there would be little incentive to harvest and sell user data. Fine-tuning could be a continuous process, allowing for a high-level of customization for each individual user. Additionally, this would give parent companies plausible deniability when it comes to some PR issues — “Our base models have been independently audited for bias. Any deviation from that is a consequence of the user’s own data and training.”
Apple is currently in the best position to capitalize on this last option. Every device in their ecosystem, your iPhone, Mac, watch, and glasses, could work together to act as passive sensors that continuously feed data into training your personal AI assistant. Perhaps this is the long-term vision for Siri. I can only hope.
-
§ I started playing The Last of Us Part II and have been enjoying it much more than part I. One of my critiques of part I was that the story ultimately comes down to a variation of “help the hero protagonist save the world”. The story in part II, though, is much more real, nuanced, and heart wrenching.
Throughout my playthrough of the first game I would always find myself wishing for an “open world” Last of Us game. Part II is, unfortunately, not that but we do get a small glimpse of what that type of game might look like during an extended mission where our character is tasked with exploring downtown Seattle; it was definitely my favorite part of the game (at least, so far).
§ There are now less than 100 days left until the end of the school year. It is too early to say whether it will feel like a breezy home stretch or a never ending slog.
§ I finally finished building a cold frame for the garden. I ended up repurposing a window that I had set aside for my (long overdue) greenhouse project. Now I need to decide what to grow in it. Maybe cabbage? Lettuce?
It also occurred to me that I should be able to easily repurpose the frame in the summer to use as a small solar dehydrator or a wind-proof cyanotype exposure box. Exciting!
§ Links
- Tinkering with hyperlinks
- ChatGPT for Robotics
- ControlNet
- Related: Scribble Diffusion and a HuggingFace demo
- Toolformer
- “In this paper, we show that LMs can teach themselves to use external tools via simple APIs”
§ Recipes
- Gluten free paczki
- These were admittedly more dense than normal paczki but still surprisingly passable
- Vegetarian Tikka Masala
- I used around half a cup of both coconut milk and heavy cream. This was really good. I’ll certainly make it again soon.
- Lemony chicken with Brussels sprouts
- I added potatoes which I think was a good choice. I have still not successfully gotten crispy chicken skin with any of these oven recipes. Not sure what I am doing wrong. Everything was still pretty tasty, though.
-
Kieran Press-Reynolds at No Bells:
In the last month or so, the [corecore] scene has transformed as it skyrocketed in popularity. It’s become dominated by moodiness. The most popular corecore videos now tend to be a minute long and feature whole clips taken from movies or YouTube videos or political speeches of people talking, often about something poignant or unnerving: feeling like you’re invisible, the commonplace dehumanization of women, the way social media has withered us into human husks of loneliness crawling through life’s cyclical sadnesses.
[…]
As bizarre as it seems, the comment sections on these corecore dissertations feature some of the more lively debate I’ve read about the political potential of short form TikTok content.
[…]
At its best, corecore wields the native language of the platform—sampling montages, hyper edits—and pierces its glossy sheen with frightening weirdness and startling vulnerability.
subscribe via RSS