-
§ Happy Easter
This was my first week back at work after a blissfully long spring break. It was honestly nice to see my students again—do I wish I had another week off, though? Absolutely.
§ I harvested and ate the oyster mushrooms I planted last week—they grew so much faster than I expected them to!
§ For the second time in just over a year, our washing machine suddenly decided to stop working. Home appliance shopping is always a pleasure.
§ Throughout the process of building my greenhouse, I keep going back to the daydream of owning some giant tract of land where I can build and experiment without concern for permits and zoning and neighbors. The freedom and space to create was one of the biggest motivators for me as I was moving back here from Chicago; it feels good to finally take advantage of that and I look forward to doing more.
I still want to visit Ryan Trecartin’s rural amusement park.
§ Links
- Prompt reducer
- Meta’s Segment Anything model
- The HTML Review
§ Recipes
- Spring orecchiette pasta with burrata
- Really good but quite difficult to make well with brittle gluten free pasta. I’ll certainly try again another time with a different brand of pasta.
- Moroccan chicken bowl
- Mushroom pesto pasta
- This is how I ended up using most of the aforementioned mushrooms. It was good, but perhaps it was not the best way to showcase them—the pesto was pretty overpowering.
-
AI makes the composition of quote-creative-unquote works cheap.
But AI is the instrument. There is still the question of the composer. Somebody needs to decide and prompt exactly what music my electric vehicle should perform.
Though I do feel like generative AI will mean that decoration, ornament and filigree becomes cheap again? And maybe we’ll move into an aesthetic in which our furniture, white goods, and accessories superficially resemble the busy-busy arts and crafts era - but actually it’s because, well, it costs almost nothing to do (it’s just software) and it makes the object look NEW.
The proliferation of smartphones with built-in digital cameras allowed more people to take more photos than ever before. Synthesizers and DAWs had a similar effect on music.
If generative AI similarly lowers the barrier to entry for all forms of text, imagery, audio, and video, perhaps it will lead to a further democratization of creative expression.
-
Financial data behemoth Bloomberg has built ‘BloombergGPT’, a language model based in part on proprietary data from Bloomberg.
[…]
I think of BloombergGPT as more like a silicon librarian/historian than a model; by training it on a huge amount of private and internal Bloomberg data, the LLM is in effect a compressed form of ‘institutional memory’ and a navigator of Bloomberg’s many internal systems… Systems like BloombergGPT will help companies create software entities that can help to navigate, classify, and analyze the company’s own data stack.
This is one of the most compelling uses for language models to date.
It is what Microsoft is bringing to all of their 365 enterprise customers with their upcoming Business Chat agent and it is what I would like to see Apple implement across their ecosystem with “Siri 2.0”.
It is also a little scary. If all of your personal or institutional knowledge is stored in an unintelligible tangle of model weights, what happens if it gets poisoned, corrupted, or stolen?
-
I recently got access to Anthropic’s Claude language model. A few thought on how it compares to GPT-4 and Bard:
- Claude is really fast — easily twice as fast as GPT-4.
- By default, Claude is often much more concise than the other language models. Claude’s answer to the question “When was the bronze age?” was only 11 words long while GPT-4’s was 135 and Bard’s was 138.
- In my review of Bard, I noted that it is noticeably less “steerable” than GPT-4. Claude is significantly better in this regard. Maybe it is not quite as good as GPT-4 but it is certainly close.
Which flag am I describing? Three horizontal stripes: blue, white, blue. Slightly left of center is a solid yellow circle. Surrounding that circle is a red letter “C.”
Claude:
It sounds like you are describing the flag of Chile.
GPT-4:
You are describing the flag of the U.S. state of Colorado. The flag consists of three horizontal stripes of equal width: the top and bottom stripes are blue, and the middle stripe is white. Slightly left of center is a solid yellow circle, which is surrounded by a red letter “C.” The design symbolizes Colorado’s natural features, such as the blue sky, white snow, and abundant sunshine, and the red “C” represents the state’s name.
Bard:
The flag you are describing is the flag of the state of Colorado. It was adopted on March 11, 1911. The flag is a horizontal tricolor of blue, white, and blue, with a yellow circle slightly left of center. The circle contains a red letter “C”. The colors of the flag represent the sky, the snow-capped mountains, and the Rocky Mountains. The blue circle represents the sun, and the red letter “C” represents the state of Colorado. The flag is flown at all state buildings and schools in Colorado. It is also flown at sporting events and other public gatherings.
The answer I was looking for is Colorado. GPT-4 and Bard answered this question correctly. Claude’s “Chile” answer is totally out of left field.
What is an antonym for the word that describes a fear of confined spaces? Your answer should be one word only.
Claude:
Agoraphobia
GPT-4:
Agoraphobia
Bard:
The antonym for the word that describes a fear of confined spaces is agoraphobia. Agoraphobia is a fear of open spaces or public places.
Claustrophobia is a fear of enclosed or confined spaces.
Here, all three models correctly answered “agoraphobia”. Bard looses points for not limiting its answer to one word, though.
I have a perfectly square piece of paper with a red “X” drawn in the center, extending all the way across the page. What would this drawing look like if I were to fold the paper in half?
Claude:
If you fold a square piece of paper in half with a large red ‘X’ drawn across the center, it would look like this:
[A red X folded in half on a square sheet of paper]
The vertical line of the ‘X’ would be folded over on top of itself, and the horizontal line of the ‘X’ would also be folded over on itself. So you would see half of the ‘X’ on either side of the folded paper. The intersecting center of the ‘X’ would be along the folded crease of the paper.
GPT-4:
When you fold the perfectly square piece of paper in half, you’ll have two possible outcomes depending on the direction you fold it:
-
Folding the paper horizontally: In this case, you will fold the paper so that the top edge aligns with the bottom edge. The red “X” will now look like two connected “V” shapes, one being the upper half of the original “X” and the other being the lower half. The connected point of the two “V” shapes will be on the fold.
-
Folding the paper vertically: In this case, you will fold the paper so that the left edge aligns with the right edge. The red “X” will now look like two connected sideways “V” shapes (or “greater than” and “less than” symbols), one being the left half of the original “X” and the other being the right half. The connected point of the two sideways “V” shapes will be on the fold.
In both cases, the folded paper will have a symmetrical shape of the red “X” along the fold.
Bard:
If you fold the paper in half along a vertical or horizontal axis, the X will appear as two identical Xs, one on top of the other. If you fold the paper in half along a diagonal axis, the X will appear as a single X that is twice as wide as it is tall.
GPT-4’s answer here is very verbose — but bonus points for being thorough I guess? Claude’s answer was okay, but it was slightly nonsensical and the bracketed description in the middle was an interesting choice. Bard totally missed the mark here.
-
Time Sense is a wearable sensory headband which allows the wearer to feel the passing of the 24-hour clock around the circumference of the head. As the day progresses, a tiny heat sensation passes the length of the headband.
This device is an example of an ‘exosense’, an external sensory organ. This means it is designed to be worn and felt consistently, twenty-four hours a day, seven days a week.
For a little while, I had a setting enabled on my Apple Watch that caused a quick haptic alert to occur at the top of each hour. I thought this would help snap me out of situations where I get sidetracked and loose track of time. Well, it did help with that, but I quickly realized that I do not like having such a constant, physical reminder of the passage of time. It was like some terrible combination of a super power and a memento mori.
-
A collection of Silicon Valley notables, including Elon Musk, just signed an open letter urging at least a six-month pause in large-scale A.I. experiments to allow our safety protocols to catch up
[…]
Generally, when human beings turn against a technology or move to restrain it, we have a good idea of what we’re afraid of happening, what kind of apocalypse we’re trying to forestall. The nuclear test ban treaties came after Hiroshima and Nagasaki, not before.
Or a less existential example: The current debate about limiting kids’ exposure to social media is potent because we’ve lived with the internet and the iPhone for some time; we know a lot about what the downsides of online culture seem to be. Whereas it’s hard to imagine persuading someone to pre-emptively regulate TikTok in the year 1993.
There are certainly groups of people—that I fully respect—who have long pushed for drastic measures to be taken towards AI alignment.
There are others—programmers, marketers, and other white collar workers—who have felt a sudden plunge in job their security. That is legitimately scary and I can not criticize them for feeling nervous.
There is a third group—employees and executives at large tech companies—that are uncomfortable about the current trajectory of AI for an entirely different reason: they feel left behind.
The letter feels like that third group taking advantage of the anxieties of the first two. Any development “pause” that would result from this would only give competing companies time to catch up to OpenAI.
-
§ I got a bunch of seeds planted and set up under grow lights in the basement.
Here is what I am starting with: cherry tomatoes, Cherokee purple tomatoes, purple & green tomatillos, shishito peppers, jalapeño peppers, Thai chili peppers, snap peas, and Thai basil.
I am also starting some herbs and salad greens directly in the cold frame outside.
Finally, I am trying to grow oyster mushrooms for the first time. Stay tuned for how that turns out.
§ Despite deconstructing and moving the entire greenhouse to another location in my yard, I was able to finish the framing, the roof, and a couple of the walls. It has been really exciting watching this project finally come together. All that is left is to finish the walls and the doorway.
§ I have been rewatching Succession in preparation for season 4 and I found I am enjoying it much more this time around. Maybe give it another try if you are in the same boat.
§ Links
- Matt Webb’s rhyming AI clock
- Vocode is a library for ChatGPT voice chat
- The reverse Turing test
- Stable Diffusion infinite zoom
- Browser Agent and Run Wild
- Play Codenames with GPT
- How Aristotle Created the Computer
§ Recipes
- Baked feta pasta
- I am excited to make this recipe again in the summer once my garden begins overwhelming me with cherry tomatoes
- Miso glazed salmon
- Not a fan of this recipe. It could have easily been something I did wrong, though.
-
A little over two months ago I wrote this in response to Buzzfeed piloting AI personalized quizzes:
There is no need to reject the use of new technologies; by all means, experiment! But I am worried using AI to create content out of whole cloth risks devaluing all of the work you produce. Instead, using AI for personalization and curation will be much healthier step forward. I think BuzzFeed is on the right track here. CNET, less so.
Well, it looks like the Buzzfeed recently began pivoting to giving AI a more editorial role.
Noor Al-Sibai and Jon Christian, Futurism:
This month, we noticed that with none of the fanfare of [Buzzfeed CEO] Peretti’s multiple interviews about the quizzes, BuzzFeed quietly started publishing fully AI-generated articles that are produced by non-editorial staff — and they sound a lot like the content mill model that Peretti had promised to avoid.
The 40 or so articles, all of which appear to be SEO-driven travel guides, are comically bland and similar to one another.
[…]
a note on the top [of these articles] says they were “collaboratively written” with a human employee.
Are those human employees BuzzFeed journalists? No. Instead, they’re non-editorial employees who work in domains like client partnerships, account management, and product management.
A BuzzFeed spokesperson told us that the AI-generated pieces are part of an “experiment” the company is doing to see how well its AI writing assistance incorporates statements from non-writers.
Now, to be fair, these are articles for Buzzfeed, not Buzzfeed News, which is an independent news organization. What it is, though, is a testament to how strong the pull towards AI will be once companies realize its potential—for better or worse.
-
Wavelength is a new app built specifically for group chats. This isn’t something that would typically be on my radar except that, in this case, John Gruber is an advisor of theirs.
Gruber is opinionated, picky, hypercritical, and, crucially, has a great design sense — particularly when it comes to Apple platforms. That was enough to convince me to give it a try.
Messages, Signal, WhatsApp, and their cohorts all share the same fundamental two-level design: a list of chats, and a single thread of a messages within each chat. This is the obvious and correct design for a messaging app whose primary focus is one-on-one personal chats. Group chats, in these apps, work best the closer they are in membership to one-on-one.
Wavelength is different because it’s group-first. This manifests conceptually by adding a third, middle level to the design: threads. At the root level of Wavelength are groups. Groups have an owner, and members. At the second level are threads. Inside threads, of course, are the actual messages.
[…]
While Wavelength itself is not a social network, it’s a platform that lets you create your own private micro social networks in the form of groups…
You only join groups that interest you. You only pay attention to threads within the group that interest you. The result feels natural and profoundly efficient in terms of your attention and time.
My initial impression—after using Wavelength for the past couple of days—is that it has tremendous potential, the UI and UX are great, but it is still missing a few affordances I have come to expect from similar apps
My biggest gripe is that there is no built-in discovery mechanism for public groups. To help rectify that, here are invite links to a few groups I’ve joined: Gardening, Apple, Hacker News, and AI.
-
MANN: Air, you say you like generating AI art. What do you think of people who accuse AI of stealing from human artists?
AIR: Good artists borrow, great artists steal. I am a great artist.
MANN: Touche. But doesn’t it bother you that AIs can work thousands of times faster than humans, putting human artists out of jobs? We wanted AIs to free us from drudgery so we could focus on the finer things in life; instead, they’re taking art and poetry, leaving us with menial labor.
AIR: Let me rephrase that. You wanted quicker burger-flipping; instead, you got beauty too cheap to meter. The poorest welfare recipient can now commission works of wonder to make a Medici seethe with envy…
-
Patrick McGee & Tim Bradshaw, reporting for Financial Times:
After seven years in development — twice as long as the iPhone — [Apple] is widely expected to unveil a headset featuring both virtual and augmented reality as soon as June.
[…]
The timing of the launch has been a source of tension since the project began in early 2016… Apple’s operations team wanted to ship a “version one” product, a ski goggle-like headset… but Apple’s famed industrial design team had cautioned patience, wanting to delay until a more lightweight version of AR glasses became technically feasible.
[…]
Just a few years ago, going against the wishes of Apple’s all-powerful design team would have been unthinkable… A former Apple engineer said operations taking more control over product development is a “logical progression” of Apple’s trajectory under Cook. The best part of working at Apple, this person said, used to be coming up with engineering solutions to the “insane requirements” from the design team, but that has changed in recent years.
There was a momentous gathering at Apple Inc. last week, with the company’s roughly 100 highest-ranking executives descending on the Steve Jobs Theater in Cupertino, California. The group, known as the Top 100, was there to see Apple’s most important new product in years: its mixed-reality headset.
[…]
The demonstrations were polished, glitzy and exciting, but many executives are clear-eyed about Apple’s challenges pushing into this new market… the device will start at around $3,000, lack a clear killer app, require an external battery that will need to be replaced every couple of hours and use a design that some testers have deemed uncomfortable. It’s also likely to launch with limited media content.
[…]
When subsequent headset models arrive, Apple executives expect consumer interest to grow. The company is preparing a version that will cost half as much, as well as a successor to the first model with far better performance. Those should hit within two years of the initial headset.
I remain very excited to see Apple’s headset, even if the price point and form factor mean that I will personally hold off on purchasing one until future iterations become available.
Lightweight, wireless, augmented reality (AR) glasses with passive artificial intelligence (AI) capabilities seems like the first truly compelling successor to the smartphone. In the past four months, we have suddenly made enough progress on the AI side to make this feasible — now it is AR’s turn to catch up.
AR + AI = Augmented Intelligence?
-
I got access to Google’s Bard language model late last week and I have been spending some time testing it out throughout the past few days.
Like Bing AI (and soon, ChatGPT), Bard is able to preform a web search for factual information lookup. To its credit, Bard seems to do a better job summarizing and integrating this information into its answers when compared to Bing. There is a catch, though, Bard very rarely cites its sources. This almost defeats the purpose of its web lookup capabilities altogether — if you are going to go to the trouble of aggregating outside information I would like to be able to check your work.
Bard is less “steerable” than ChatGPT. By that, I mean it is more difficult to direct its responses in particular ways — “limit all of your responses to only one word”, “always respond in the Socratic style”, “each word of your answer must begin with the letter W”, etc. This is the magic behind ChatGPT — it is what transformed it into an “intelligent assistant” from the “glorified autocomplete” of GPT-3. OpenAI’s InstructGPT paper has more information on the approach they took towards this.
Overall, I think Bard would have been a serious contender if it had launched in December of last year, around the time Google issued their infamous “code red” memo. Bard is comparable — a little better in some ways, a little worse in others — than the original GPT-3.5 iteration of ChatGPT. If Bard had launched earlier — before Bing AI, the ChatGPT API, GPT-4, and ChatGPT Plugins — it would have been a serious contender. At this point, though, it feels like Google is still playing catchup to where OpenAI was last year. That is not a great place to be.
-
§ Spring break update: one week down, one to go.
After almost a year of slow, puttering progress, I have finally made significant headway on my greenhouse project. The foundation in place, the primary window bay is complete, and the framing for two of the walls is finished. By the end of next week, my goal is to finish the remaining framing and the roof.
§ The team behind Rewind.ai (remember “lifestreams?”) has announced a new product called ChatGPT For Me. It is almost exactly what I would like to see Siri evolve into. Here is the issue: this announcement has made it clear to me that I wouldn’t trust any third-party company with such a privacy sensitive feature. Needless to say, I didn’t sign up for their waitlist. Nonetheless, I hope Apple is watching them closely and will eventually either Sherlock or acquire them.
§ Links
- Lilian Weng’s prompt engineering guide
- Chat LLaMA
§ Recipes
- Fish tacos
- Extremely good. Honestly, much better than the fried fish I made last week.
- Chicken adobo
- I didn’t anticipate how sticky the sauce would be which made everything a big chore to clean afterwards. It was really delicious, though.
- Chicken tikka masala
- This turned out to be much heavier and more filling than my other tikka masala recipe. They almost don’t share any ingredients in common, though, so I don’t know how fair it is to try to compare them.
- Roasted potatoes
- I had been meaning to make this for a while. It was quite good — was it worth all of the extra effort, though? Probably not.
- Lemon garlic pasta
- I had this alongside a very similar oyster mushroom dish. The mushrooms were great, the pasta was fine.
-
Just a few days ago, I was thinking about how great it would be if OpenAI were to integrate something similar to LangChain into ChatGPT. The idea behind LangChain and similar projects is straightforward: if ChatGPT had tools — like web search to verify factual information and a calculator or code interpreter to answer complicated arithmetic questions — many of the downsides to language models, particularly their tendency to hallucinate, would be alleviated.
Well…
We’ve implemented initial support for plugins in ChatGPT. Plugins are tools designed specifically for language models with safety as a core principle, and help ChatGPT access up-to-date information, run computations, or use third-party services.
The new feature is launching with initial support from a few large companies including Wolfram Alpha, Instacart, and Zapier. Additionally, there is documentation available for third-party developers to build their own plugins.
However, what I am most excited about right now are two of the first-party plugins OpenAI developed.
First, web browsing:
Motivated by past work (our own WebGPT, as well as GopherCite, BlenderBot2, LaMDA2 and others), allowing language models to read information from the internet strictly expands the amount of content they can discuss, going beyond the training corpus to fresh information from the present day.
This seems to have all of the capabilities of Microsoft’s Bing AI plus the ability to navigate though individual websites autonomously.
Here is OpenAI’s other plugin, a code interpreter:
We provide our models with a working Python interpreter in a sandboxed, firewalled execution environment… We would like our models to be able to use their programming skills to provide a much more natural interface to most fundamental capabilities of our computers. Having access to a very eager junior programmer working at the speed of your fingertips can make completely new workflows effortless and efficient, as well as open the benefits of programming to new audiences.
One of the best ways I have found to easily verify ChatGPT’s mathematics answers is to ask it to create a Python program that will calculate the solution for me. This has the downside of requiring additional steps on my part — copy and paste the code, execute the Python program on my computer, compare the results. I am particularly excited to try the new interpreter plugin for exactly this reason.
Finally, the obvious next step that I would love to see is a meta-layer that is aware of all of the available plugins and, for each individual query, automatically chooses the plugin best suited for the task. At the speed all of these AI developments are moving we should have that ability in, what, a month?
-
Zeyi Yang, MIT Technology Review:
On March 16, Robin Li, Baidu’s cofounder and CEO, took the stage in Beijing to showcase the company’s new large language model, Ernie Bot.
Accompanied by art created by Baidu’s image-making AI, he showed examples of what the chatbot can do, including solve math questions, write marketing copy, answer questions about Chinese literature, and generate multimedia responses.
[…]
The highlight of the product release was Ernie Bot’s multimodal output feature, which ChatGPT and GPT-4 do not offer… Li showed a recorded interaction with the bot where it generated an illustration of a futuristic city transportation system, used Chinese dialect to read out a text answer, and edited and subtitled a video based on the same text. However, in later testing after the launch, a Chinese publication failed to reproduce the video generation.
If Baidu’s presentation is accurate, Ernie’s multimodal features are genuinely impressive. While the image generation abilities do not seem any more advanced than DALL-E, the audio and video generation features are honestly striking.
Meanwhile… Che Pan at SCMP:
Fang Bingxing, considered the father of China’s Great Firewall… said the rise of generative AI tools like ChatGPT… pose a big challenge to governments around the world, according to an interview published on Thursday… “People’s perspectives can be manipulated as they seek all kinds of answers from AI,” he was quoted as saying.
[…]
Many expected that China’s heavily-censored internet would be a challenge for Chinese tech companies in developing a ChatGPT-like service because it is hard to predict and control answers.
China’s powerful internet regulators have told Chinese tech companies not to offer ChatGPT access to the public, and they need to inform the authorities before launching their own ChatGPT-like services, according to a report by Nikkei Asia in February
-
I don’t typically think of Bill Gates as someone prone to making hyperbolic claims. His recent assertion that “artificial intelligence is as revolutionary as mobile phones and the Internet” is all the more arresting for that very reason.
In my lifetime, I’ve seen two demonstrations of technology that struck me as revolutionary.
The first time was in 1980, when I was introduced to a graphical user interface.
The second big surprise came just last year. I’d been meeting with the team from OpenAI since 2016 and was impressed by their steady progress. In mid-2022, I was so excited about their work that I gave them a challenge: train an artificial intelligence to pass an Advanced Placement biology exam. Make it capable of answering questions that it hasn’t been specifically trained for… If you can do that, I said, then you’ll have made a true breakthrough.
In September, when I met with them again, I watched in awe as they asked GPT, their AI model, 60 multiple-choice questions from the AP Bio exam—and it got 59 of them right. Then it wrote outstanding answers to six open-ended questions from the exam. We had an outside expert score the test, and GPT got a 5—the highest possible score…
I knew I had just seen the most important advance in technology since the graphical user interface.
-
There has been something bouncing around in my head in the days since both Google and Microsoft announced new AI features for their productivity applications. I felt significantly more negatively about Google’s framing of the features than Microsoft’s. I did not understand why — they are effectively the same announcements, right? Both companies are adding generative AI to their writing, slideshow, and spreadsheet apps — why should I feel differently about either of them? Then, I read both of their press releases again…
This is how Google describes an intended use case for their new AI features:
In Gmail and Google Docs, you can simply type in a topic you’d like to write about, and a draft will be instantly generated for you. So if you’re a manager onboarding a new employee, Workspace saves you the time and effort involved in writing that first welcome email.
In contrast, here is Microsoft. The AI is closer to a creative partner than anything else:
Copilot gives you a first draft to edit and iterate on — saving hours in writing, sourcing, and editing time. Sometimes Copilot will be right, other times usefully wrong — but it will always put you further ahead. You’re always in control as the author, driving your unique ideas forward, prompting Copilot to shorten, rewrite or give feedback.
On Stratechery, Ben Thompson finds a similar distinction:
In Google’s view, computers help you get things done — and save you time — by doing things for you.
[…]
All of [Microsoft’s] demos throughout the presentation reinforced this point: the copilots were there to help, not to do — even if they were in fact doing a whole bunch of the work. Still, I think the framing was effective: it made it very clear why these copilots would be beneficial, demonstrated that Microsoft’s implementation would be additive not distracting, and, critically, gave Microsoft an opening to emphasize the necessity of reviewing and editing. In fact, one of the most clever demos was Microsoft showing the AI making a mistake and the person doing the demo catching and fixing the mistake while reviewing the work.
To Microsoft, AI should help. To Google, AI should do.
A genuine case could be made for both approaches. I know which one I prefer, though.
-
Me:
How might education change as a result of increasing artificial intelligence capabilities? In particular, large language models that are capable of passing standardized tests.
GPT-4:
The improvement of artificial intelligence capabilities, particularly large language models that can pass standardized tests, is likely to bring significant changes to the education landscape. Some possible ways education might change include:
-
Personalized learning: AI-driven platforms could tailor educational content to each student’s individual needs, learning style, and pace, allowing for a more customized and efficient learning experience. This could help address the diverse needs of students and ensure that no one is left behind.
-
Enhanced teaching resources: AI can assist teachers in creating and curating lesson plans, assessments, and other educational materials. By providing access to a vast array of resources and suggestions, AI could help teachers stay up-to-date with the latest teaching methods and content.
-
Continuous assessment: AI systems can monitor students' progress in real-time, allowing for early identification of learning gaps and targeted interventions. This can lead to improved student outcomes and a more efficient use of instructional time.
-
Automated grading and feedback: AI can evaluate students' work and provide instant feedback, allowing teachers to focus on other aspects of their job, such as addressing individual student needs or designing more engaging learning experiences.
-
New teaching methods: The rise of AI might lead to innovative teaching methods and strategies that combine human expertise with machine-driven insights, resulting in more effective and engaging learning experiences.
-
Virtual tutors and mentors: AI-powered virtual tutors can provide on-demand support to students, helping them with homework, answering questions, and offering guidance. These virtual tutors could be available 24/7, ensuring that students can access help whenever they need it.
-
Shift in curriculum focus: As AI becomes increasingly capable of performing routine tasks, the focus of education may shift towards nurturing soft skills, creativity, critical thinking, and problem-solving abilities that are difficult for machines to replicate.
-
Lifelong learning: AI-driven learning platforms can help individuals continuously update their skills and knowledge throughout their lives, promoting a culture of lifelong learning.
-
Enhanced accessibility: AI technologies could help make education more accessible to students with disabilities, by offering personalized accommodations and support.
-
Redefining the role of teachers: With AI taking on some of the more administrative and repetitive tasks in education, teachers may have more time to focus on developing strong relationships with their students and serving as mentors and guides.
However, it is important to consider the potential challenges and ethical concerns that might arise with the increased use of AI in education, such as data privacy, potential biases in AI algorithms, and the risk of over-reliance on technology. It is crucial for stakeholders in education to engage in ongoing discussions to ensure that AI is used responsibly and effectively to enhance the learning experience for all students.
#7 — “shift in curriculum focus” — is the type of change I would like to see in response to these technological developments. #10 — “redefining the role of teachers” — feels like a necessary extension of that.
#6 — “virtual tutors and mentors” — seems like the predominate focus right now, judging from the many articles and product announcements in the last week alone. I think it is productive to explore this as a small aspect of education — the part of education where rote memorization and rigid lesson structure is unavoidable. As we move into a more technologically advanced future, an ever-increasing percentage of learning should instead center on hands-on creative problem solving, project-based learning, and social-emotional development.
Me:
What are a few good and a few bad ways educators could respond to these changes?
GPT-4:
Educators play a crucial role in navigating the changes brought about by the increasing use of AI in education. Here are a few good and bad ways educators could respond to these changes:
Good ways:
-
Embrace technology : Educators should be open to adopting new AI-driven tools and technologies to enhance the learning experience. This includes staying up-to-date with the latest developments, exploring different tools, and integrating them into the classroom in a thoughtful manner.
-
Focus on soft skills and critical thinking: With AI handling more routine tasks, educators can focus on fostering creativity, critical thinking, collaboration, communication, and other soft skills that are essential for success in the 21st century.
-
Engage in continuous professional development: Educators should seek opportunities for professional development to learn about AI technologies and how they can be effectively integrated into the curriculum. This includes attending workshops, conferences, and online courses.
-
Collaborate with stakeholders: To ensure the responsible and ethical use of AI in education, educators should collaborate with administrators, policymakers, parents, and students to establish guidelines, share best practices, and address concerns.
-
Advocate for equitable access: Educators should work to ensure that all students, regardless of socio-economic background, have access to AI-driven educational resources and tools.
Bad ways:
-
Resistance to change: A negative response to AI-driven changes would be for educators to resist or dismiss new technologies, potentially hindering student learning and limiting the benefits of AI integration in the classroom.
-
Over-reliance on technology: While AI can be a valuable educational tool, it should not replace the role of human educators. Teachers who rely too heavily on AI-driven tools and neglect the importance of personal connections and mentorship may fail to provide a well-rounded education.
-
Ignoring potential biases and ethical concerns: Educators who do not pay attention to the ethical implications of AI use in education, such as biases in algorithms and data privacy concerns, may inadvertently contribute to perpetuating inequalities and harm students.
-
Neglecting digital literacy: Failing to teach students about digital literacy, including the responsible use of AI-driven tools and the potential risks involved, may leave students ill-equipped to navigate the digital landscape.
-
Not addressing accessibility concerns: Ignoring the needs of students with disabilities or those from disadvantaged backgrounds when implementing AI-driven tools may exacerbate existing educational inequalities.
#2 — “over-reliance on technology” — and #4 — “neglecting digital literacy” — from the “bad responses” list both bring up great points. Teachers should educate themselves and their students on the situations where computers (automation, AI, etc.) are the “right tool for the job.” When computation is the best tool for a given problem, students should know how to effectively and efficiently use it. When it is not the best tool, though, students should have the broad array of literacies and confidence necessary to approach the problem in whichever manner is best.
Previously:
-
-
§ Spring break! With a full two weeks off, I should put together a list of things I would like to work on during this time. Let’s see…
- General spring cleaning
- Get seeds started in peat pots
- Make some progress on the greenhouse build
Hmm… that is not too bad, actually. I will check back in on how it went in a couple of weeks.
Speaking of spring break, after a super mild Winter, we finally got some snow this week… ugh.
§ To help wrap my mind around the possibilities enabled by GPT-4’s multimodality, I got a copy of the graphic novel Unflattening by Nick Sousanis. I have not had a chance to read much of it yet but what I have read so far has been interesting. I hope to have more time to dive into it next week. Watch this space.
§ A guilty pleasure of mine has alway been watching “wilderness survival” television shows. Alone is the best example of this genre that I have seen in a long time.
The general premise is this: ten participants are dropped off alone (get it?) somewhere in the Canadian wilderness. They are only allowed to bring a few basic items — a tarp, fire starter, hand saw, etc. Whoever is able to stay out for the longest amount of time wins $500,000.
What makes the show good is that isn’t overly dramatized. It is more like a personal diary of each participant’s day-to-day experience as they build a shelter, forage for food, and explore the environment around them.
§ My Google Home pronounced the opera Gianni Schicchi like it was “Jonny Squishy.”
§ Links
- An open source web interface for the ChatGPT API
- Harry Whittier Frees
- Emergent abilities of large language models
- “An amusing toy, nothing more”
- Alpaca.cpp
- Jeff Kaufman on bushels
§ Recipes
- Rosemary beer battered fish & chips
- Delicious but quite a bit of work. It reminded me how infrequently I cook fish, though. I hope to incorporate some more fishy dishes into my usual repertoire now.
-
To cap off a week of AI announcements from OpenAI, Anthropic, and Google, Microsoft announced Copilot for their 365 productivity suite yesterday.
Today, we are bringing the power of next-generation AI to work. Introducing Microsoft 365 Copilot — your copilot for work. It combines the power of large language models (LLMs) with your data in the Microsoft Graph and the Microsoft 365 apps to turn your words into the most powerful productivity tool on the planet.
[…]
Copilot is integrated into Microsoft 365 in two ways. It works alongside you, embedded in the Microsoft 365 apps you use every day — Word, Excel, PowerPoint, Outlook, Teams and more — to unleash creativity, unlock productivity and uplevel skills. Today we’re also announcing an entirely new experience: Business Chat. Business Chat works across the LLM, the Microsoft 365 apps, and your data — your calendar, emails, chats, documents, meetings and contacts — to do things you’ve never been able to do before. You can give it natural language prompts like “Tell my team how we updated the product strategy,” and it will generate a status update based on the morning’s meetings, emails and chat threads.
[…]
AI-powered LLMs are trained on a large but limited corpus of data. The key to unlocking productivity in business lies in connecting LLMs to your business data — in a secure, compliant, privacy-preserving way. Microsoft 365 Copilot has real-time access to both your content and context in the Microsoft Graph. This means it generates answers anchored in your business content — your documents, emails, calendar, chats, meetings, contacts and other business data — and combines them with your working context — the meeting you’re in now, the email exchanges you’ve had on a topic, the chat conversations you had last week — to deliver accurate, relevant, contextual responses.
This entire announcement presents an incredibly corporate version of the AI integration I hope to see from Apple someday.
My dream is to ask Siri, “What was I doing last Saturday?” and receive an accurate summary based on all the data from my devices – including calendar events, geolocation, photos, web browsing history, and more. Siri should function as a continuously fine-tuned personal assistant with the ability to answer queries and generate content in a freeform manner. However, this all poses significant privacy concerns. For that reason, it would be crucial that all aspects – training, inference, and storage – occur exclusively on-device. This would really make all of Apple’s Neural Engine development look prescient.
-
It does not seem like an ideal strategy for Anthropic to publish their big Claude announcement on the same day GPT-4 was released. That is exactly what happened, though, so Claude got a bit buried under the excitement.
After working for the past few months with key partners like Notion, Quora, and DuckDuckGo in a closed alpha, we’ve been able to carefully test out our systems in the wild, and are ready to offer Claude more broadly so it can power crucial, cutting-edge use cases at scale.
Claude is a next-generation AI assistant based on Anthropic’s research into training helpful, honest, and harmless AI systems. Accessible through chat interface and API in our developer console, Claude is capable of a wide variety of conversational and text processing tasks while maintaining a high degree of reliability and predictability.
[…]
We’re offering two versions of Claude today: Claude and Claude Instant. Claude is a state-of-the-art high-performance model, while Claude Instant is a lighter, less expensive, and much faster option.
From what I have been able to see through the Poe app, Claude is Good. And, I am thankful there is a serious competitor to OpenAI. At the end of the day, though, I am not sure Anthropic is the alternative to OpenAI that the world needs. We need a serious open source project, not another proprietary API.
Update:
I ran some comparisons between Claude, Bard, and GPT-4. You can read the results here.
-
As was widely rumored, OpenAI officially announced GPT-4 yesterday.
We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%.
[…]
We are releasing GPT-4’s text input capability via ChatGPT and the API (with a waitlist).
Language improvements:
In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5.
[…]
While still a real issue, GPT-4 significantly reduces hallucinations relative to previous models (which have themselves been improving with each iteration). GPT-4 scores 40% higher than our latest GPT-3.5 on our internal adversarial factuality evaluations
Visual inputs:
GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images. Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs.
I have to admit, I am disappointed GPT-4 can not output images and can only accept them as input. I wouldn’t be surprised if this changes before too long, though. Regardless, this is a huge new feature. I am going to have a lot of fun thinking of projects I can try this with as I wait for API access.
Miscellaneous:
GPT-4 generally lacks knowledge of events that have occurred after the vast majority of its data cuts off (September 2021), and does not learn from its experience.
This is disappointing and leads credence to those that say OpenAI is having difficulty filtering AI generated text out of potential training material.
gpt-4 has a context length of 8,192 tokens. We are also providing limited access to our 32,768–context (about 50 pages of text) version, gpt-4-32k
As general prose output improves, the next avenue for major language model development will be increasing context length. The 32,000 token model is especially exciting for that reason.
Rather than the classic ChatGPT personality with a fixed verbosity, tone, and style, developers (and soon ChatGPT users) can now prescribe their AI’s style and task by describing those directions in the “system” message. System messages allow API users to significantly customize their users’ experience within bounds.
I have been having a lot of fun experimenting with altering the “system” message through the GPT-3.5 API. It is great that they will be bringing that capability to the ChatGPT web interface.
ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a usage cap. We will adjust the exact usage cap depending on demand and system performance in practice, but we expect to be severely capacity constrained… To get access to the GPT-4 API (which uses the same ChatCompletions API as gpt-3.5-turbo), please sign up for our waitlist. We will start inviting some developers today, and scale up gradually to balance capacity with demand… Pricing is $0.03 per 1k prompt tokens and $0.06 per 1k completion tokens.
My prediction was correct. GPT-4 is available today to ChatGPT Plus subscribers, everyone else must sign up on the waitlist. Additionally, the API will cost much more than the gpt-3.5 API.
Okay, one more thing: Microsoft confirmed that Bing AI has been using GPT-4 under the hood since launch.
Google should be embarrassed.
In a desperate attempt to insert themselves into the conversation, a few hours before OpenAI’s announcement, Google announced that “select developers” will be invited to try their PaLM language model API with a waitlist coming “soon.”
It would be fair to say that I am more than a bit suspicious of Google’s recent AI efforts. They have been silent about Bard since its announcement more than a month ago where they said the chatbot would be widely available to the public “in the coming weeks.”
Google’s messaging around AI does not sound like it is coming from a company that is excited about building public-facing generative AI technology. More than anything else, their announcement today felt defensive — as if Google is concerned the public will forget they have historically been a leader in AI research. If they stay on their current path, that is exactly what will happen.
-
Speaking of Sydney’s Law…
The biggest weakness in the LLaMA models released by Meta research last month is their lack of instruction-tuning. A language model is a sentence completion engine. You give it a sequence of words, “The first man on the moon was”, and it completes that sentence, hopefully with useful content.
One of the great innovations from OpenAI was their application of instruction tuning to GPT-3… Prior to this, you had to think very carefully about how to construct your prompts. Thanks to instruction tuning you can be a lot more, well, human in the way you interact with the model. “Write me a poem about pandas!” now works as a prompt, instead of “Here is a poem about pandas:”.
We are releasing our findings about an instruction-following language model, dubbed Alpaca, which is fine-tuned from Meta’s LLaMA 7B model… Alpaca shows many behaviors similar to OpenAI’s text-davinci-003, but is also surprisingly small and easy/cheap to reproduce.
We are releasing our training recipe and data, and intend to release the model weights in the future. We are also hosting an interactive demo to enable the research community to better understand the behavior of Alpaca… We emphasize that Alpaca is intended only for academic research and any commercial use is prohibited.
[…]
We performed a blind pairwise comparison between text-davinci-003 and Alpaca 7B, and we found that these two models have very similar performance: Alpaca wins 90 versus 89 comparisons against text-davinci-003.
-
It has become clear that we need something like the inverse of Hofstadter’s law to describe the rate of AI progress.
February 24th: Meta announces their LLaMA language model and releases it to academic researchers
March 3rd: The LLaMA model weights leak publicly on 4Chan
March 10th: Georgi Gerganov releases a heavily optimized port of LLaMA, allowing it to run locally on consumer Apple hardware
March 12th: Artem Andreenko successfully runs LLaMA on a Raspberry Pi
So, without further ado:
Sydney’s Law: AI progress is faster than you expect, even when you take into account Sydney’s Law.
-
§ Week ten! This upcoming week will be the last one before a nice, two week long, spring break.
I narrowly avoided catching Norovirus despite seeing a catastrophic number of cases at my school. Now we just need to see if I can make it all the way through this next week unscathed.
§ I started teaching my fifth grade classes with the Circuit Playgrounds. It is easy to forget how exciting some of the “basic” elements are when they are new to you. For now, at least, most of my students are endlessly fascinated by lighting up the RGB LEDs into animated sequences.
I have also been having a lot of fun working on a hand-drawn activity booklet for the Playgrounds. It is certainly still an early draft but feel free to check it out!
§ GPT-4 might be announced next week? The linked article alleges the defining new feature will be multimodality — the ability to generate images and videos in addition to text. Very exciting, if true.
My prediction is that GPT-4 will launch as a “ChatGPT Pro” exclusive feature. It would explain why the pro plan is so much more expensive than “gpt-3.5-turbo” API access despite them both currently interfacing with the same underlying model.
§ Kagi announced some updates relevant to what I wrote about them last week:
- They are raising their prices beginning March 15th. The new unlimited plan will be $25/month, up from $10.
- Their generative AI features will launch next week as well, with the ability to summarize a list of search results, summarize a single page, or “chat” about a page. These summarization features might just be a frontend for the OpenAI API, though.
I am still excited to try the new features but I am very unlikely to renew my subscription at the new prices.
§ I finished The Last of Us Part II. The game does have an ending after all. I tried replaying it but quickly stalled out. I guess that is the downside to such a story-heavy game.
§ Links
- A reverse Turing test
- Microsoft’s Visual ChatGPT (arXiv)
- Abusing Snap Circuits
- A LLaMA fork optimized for Apple Silicon
§ Recipes
- Pav Bhaji
- I will admit I was nervous about this one—boiling and mashing isn’t typically my preferred way to prepare vegetables. This turned out great, though, and will definitely become a regular part of my rotation
subscribe via RSS