• Apple kicked off its annual WWDC conference on Monday. Here are my initial impressions after watching the keynote:

    macOS, iOS, and watchOS

    • There were a lot of mentions of “on-device intelligence” and “machine learning.” No one said “AI.”
    • There is a new Mac Pro with Apple Silicon as well as a 15” MacBook Air. Both will be available next week.
    • The iOS 17 presentation started with “big updates to the Phone app” which I would have never in a million years guessed. I will admit, the new “Live Voicemail” feature looks great though.
    • The long segment dedicated to iMessage’s “new Stickers expirence” should put to rest fears that Apple would ever feel pressed for time. Indeed, the keynote was over two hours long.
    • Autocorrect in iOS 17 is powered by “a transformer-based on-device language model.” It will be able to correct on a sentence level rather than individual words.
    • The Journal app that was rumored is real but won’t be available at launch—it is coming “later this year”
    • Interactivity is coming to widgets on all platforms. On macOS Sonoma, you will be able to add widget to the desktop.
    • You will be able to set multiple simultaneous timers
    • Death Stranding will be coming to Apple Silicon Macs. There was no mention of it during the later headset discussion.
    • Safari gets a new Profiles feature. I’ve always loved Containers in Firefox and have missed them since switching to Safari. It seems like a logical extension of the OS-wide Focus Mode feature they introduced last year.
    • WatchOS 10 is launching with a comprehensively redesigned interface. A notable exception is the “honeycomb” app launcher which appears unchanged.
    • There is still no ability for third party developers to create custom watch faces. Apple is offering the consolation prize of “a new Snoopy and Woodstock watch face.”
    • iPadOS got… no new pro-level features? I am kind of shocked Apple didn’t save their recent Final Cut and Logic Pro app release announcement for this event.

    One more thing

    • It is official: Apple announced their new XR googles and they are called “Vision Pro”
    • Apple is calling this their first “spatial computing“ device which is a better descriptor than AR/VR/XR
    • They really do look a lot like ski goggles
    • There is a screen on the front of them that displays your eyes. It is a weird concept that was executed in a much better way than I would have ever expected. The more I think about it—and I can’t believe I’m saying this—it might be the defining innovation here. I expect to see it copied by other hardware makers soon.
    • The hardware looks bulky and awkward. The software, UX, and design language, though, looks incredible.
    • For input, there is eye tracking, hand gesture recognition, voice, and a virtual keyboard. Vision Pro also works with physical Magic Keyboards and game controllers.
    • The headset can capture 3D photographs and videos
    • It has two hours of battery life with an external battery
    • Leading up to this event, a lot of people were speculating the Vision Pro would be cheaper than it’s rumored price of $3000—in reality, it will be more expensive at $3499.
    • Vision Pro is clearly a first generation product. It is expensive and has a short battery life even with bulky hardware and an external battery pack. Waiting for the second generation version is unquestionably the smartest decision. It is going to be extremely temping, though. At least I’ll have some time to decide—it will be available to purchase next year.
    • I can’t wait to try them
  • NPR’s Planet Money podcast has just concluded a three part series where they used generative AI to write an episode for them.

    Kenny Malone, Planet Money:

    In Part 1 of this series, we taught AI how to write an original Planet Money script by feeding it real research and interviews. In Part 2, we used AI to clone the voice of our former colleague Robert Smith. 

    Now, we’ve put everything together into a 15-minute Planet Money episode.

    I didn’t find the simulated Robert Smith voice to be particularly convincing but that might be because I have so much experience listening to the real Robert Smith. I think AI generated voices are already good enough to tackle many lower stakes applications but pacing and inflection are just too important to podcasting and we are just not quite there yet.

    In terms of content, I thought the episode was, at times, slightly nonsensical and bland but overall totally passable. If I wasn’t primed in advance to expect AI content, there is a chance I wouldn’t have noticed.

    I don’t think I would feel particularly good about spending too much of my time listening to wholly AI generated podcasts but I think it is somewhat inevitable once the voice simulation technology improves.

    It would be fascinating to see the Planet Money team revisit this experiment in a few years.

  • § School is out. Classes are now completely finished for the year.

    I just have a few meetings and some loose ends to tie up this week. After that, I have set aside a couple of weeks to enjoy the summer before my new job starts up.


    § Now that Succession has concluded, I can easily say that this final season was their best. I can’t recall that being the case with any of my other favorite shows.

    Fingers crossed for a Better Call Saul-style spin-off series starring Greg or Connor.


    § I have been sawing down some stray tree branches in the backyard to give the plants access to a bit more sunlight.

    Using all of the extra pliable branches I have been attempting to construct a small wattle fence. The process couldn’t be more straightforward but it is still a lot of work.


    § The video game developer Hideo Kojima is rumored to be working with Apple on a game for their upcoming XR headset.

    Although I don’t think of myself as a particularly avid video game player, this is actually the news that has made me the most excited about the headset so far.

    A few years ago I purchased a PlayStation 4 just to play Kojima’s previous game: Death Stranding—it isn’t impossible something similar will end up happening again here.


    § Links

    § Recipes

    • Jaeyook Kimchi Bokum
      • While not technically related, there is a sense in which this feel like a better version of Phat Bai Horapha which I linked to a little while back.
      • I’ve seen alternative recipes for this where cabbage and carrots are included. I am definitely going to try adding those next time.
  • The conversation around Apple’s upcoming Worldwide Developers Conference understandably centers around their rumored XR headset. The launch of an entirely new computing platform will undoubtedly make for an exciting event but there is an entirely different set of possibilities that I haven’t seen discussed nearly as much this year. Aside from the big stuff, WWDC always brings a slate of equally impactful, but less flashy, changes. Here is my wishlist from that second category:

    • A dedicated “Passwords” app outside of Settings
      • Bonus: a refresh of Keychain Access on macOS
    • The new journaling app that is rumored for iOS 17 especially if it incorporates data I already have from years past
    • Some love for Apple Mail—I am getting jealous of Mimestream
    • Better widgets, interactive widgets, widgets on the macOS desktop
    • When I pause audio, leave the Dynamic Island’s player controls accessible for longer than a fraction of a second
    • Improve notifications on macOS
    • Add a clipboard manager to iOS and iPadOS
  • Adam Mastroianni:

    Design involves both technological engineering and psychological engineering, and psychological engineering is harder. Doors don’t often fall off their hinges, get stuck, or snap in half—all feats of technological engineering. They do often lock accidentally, set off unintended alarms, and mislead people about how to open them—all failures of psychological engineering.

    […]

    Once spotted, psychological engineering problems are tricky to solve. Unlike battery life or fire resistance, “intuitiveness" is hard to quantify and thus hard to optimize. The fundamental attribution error leads us to blame design failures on stupid people rather than stupid products.

    […]

    Anyone who can overcome these challenges is rewarded with indifference. Psychological engineering problems are hard to spot in the first place, so people rarely notice when you solve them. People hate pushing a pull door, but they don’t think at all when they push a push door. Unlike technological engineering, which can be explained and sold (“This car gets 55 miles to the gallon!”) and thus copied and improved, good psychological engineering melts into the background.

    The good designs that don’t spread, then, are probably solving psychological engineering problems. Technological engineering marches ever forward, which is why the next phone you get will be slimmer and faster and last longer on a single charge. Psychological engineering remains stuck, which is why the next building you enter will probably be full of Norman Doors.

  • Guanzhi Wang et al.:

    We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention… Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch

    […]

    Voyager exhibits superior performance in discovering novel items, unlocking the Minecraft tech tree, traversing diverse terrains, and applying its learned skill library to unseen tasks in a newly instantiated world. Voyager serves as a starting point to develop powerful generalist agents without tuning the model parameters.

    Last month we saw Sanford researchers create a version of The Sims inhabited by LLM-powered agents. These agents exhibited surprisingly complex social skills.

    This new research shows that agents based on a similar architecture can create and explore in novel environments.

    As this technology becomes less expensive, we will start to see incredible new virtual experiences that were previously unimaginable.

    As capabilities improve further, we will reach the point where we pass some sort of fundamental threshold—like the uncanny valley—where the characters that inhabit our virtual environments become too lifelike.

    At its height, people spent a lot of time playing Second Life and it, well, looked like Second Life. We don’t even need hyperrealistic experiences for things to start getting scary, though. Imagine a version of Grand Theft Auto where every NPC has their own unique set of ambitions and relationships. I wouldn’t be surprised if someone could hack that together with the technology available today. Once that happens, we will need to start having some difficult conversations.

  • Ilia Blinderman, from the excellent visual journalism publication The Pudding, shares some of what he has learned about how to create well designed, data-driven essays:

    We have a curious tendency of assuming that people who can do certain things that we cannot are imbued with superior innate talents… This may be especially common for the sort of code-driven interactive data visualizations which we work on, since they rely on an odd grab-bag of skills —critical thought, design, writing, and programming — that people in many other professions may have neither a full awareness of, nor full expertise in.

    […]

    I’m hoping that my putting this guide together will help remove some of the unnecessary mystique surrounding data viz, and demonstrate that the only things that separate a beginner from a speaker on the conference circuit [are] time and practice.

    I recently wrote about how I am currently wrapping up a big data visualization project with my students—I wish I had known about this resource earlier!

  • Jieyi Long:

    In this paper, we introduce the Tree-of-Thought (ToT) framework, a novel approach aimed at improving the problem-solving capabilities of auto-regressive large language models (LLMs). The ToT technique is inspired by the human mind’s approach for solving complex reasoning tasks through trial and error.

    Here is the problem: LLMs do not know whether the answer they are currently generating is accurate or optimal. Once they start down a particular path, they are locked in, unable to reconsider unless they are later prompted to.

    Language models do not explicitly perform logical correctness checks as it generates a new token based on the previous tokens. This limits the model’s capacity to rectify its own mistakes. A minor error could be amplified as the model generates more tokens

    Tree-of-thought lets the model explore multiple solutions, backtracking when a particular solution is deemed to be suboptimal. Compared to previous “chain-of-thought” prompting techniques, tree-of-thought gives the LLM more computation time before arriving at a final conclusion.

    As mentioned above, LLMs typically generate a token based on the preceding sequence of tokens without backward editing. On the contrary, when a human solver attempts to solve a problem, she might backtrack to previous steps if a derivation step is incorrect, or if she becomes stuck and is unable to make further progress towards arriving at the final answer.

    […]

    [The tree-of-thought framework] incorporates several components which enhance the problem solving capability of the LLM, including a prompter agent, a checker module, a memory module, and a ToT controller.

    It is fascinating to think about what studying language models can teach us about our own cognition.


    Related: Loom—A “multiversal tree writing interface for human-AI collaboration”

  • § This was our last full week of school. There are only two more days of classes next week and then that’s it.

    This might be a good time to mention that I will be starting a new job at the end of June. It is a new position at my city’s science museum that touches on a little bit of everything: some programming work, new interactive exhibit design, even some curriculum development and teaching. I am excited!


    § I have spent this entire fourth quarter of the school year teaching my third and fourth grade students using the Circuit Playground microcontrollers.

    I started simple: light up the onboard LEDs. Then I added a new tiny building bock each lesson: buttons, switches, RGB color codes, how each of the built-in sensors work…

    Throughout the quarter, I had three big projects.

    First: Take the knowledge you have of all of the Circuit Playground’s sensors and devise a method to detect when someone picks up the circuit board.

    The students came up with so many creative solutions. Some used the accelerometer, others used the photoresistor, a few even used the capacitive touch pads that surround the board. Most students realized that using a combination of multiple sensors works best.

    Everyone had lots of fun testing each other’s projects. I got to take on the role of “the mastermind Circuit Playground thief”. It was great.

    The second big project was to recreate the classic arcade game Cyclone, step by step. The students loved creating their own gameplay variants.

    Finally, all of this is culminated in a big end of the year project that I am especially excited about.

    Each student thought of a research question—Is the lunch room louder on Mondays or Fridays? Which group gets more exercise at recess—those playing soccer or football?—then they used the Circuit Playground boards to collect relevant data. After collecting their data, they analyzed it to see whether or not their hypothesis was correct.

    Overall, I have immensely enjoyed teaching with these boards.


    § One week later, I still really like the reMarkable tablet. You can really only do two things—read and draw—so it is hard to get distracted while using it. I have been reading much more than I typically would.

    It is actually easier to get new articles and books on the device than I anticipated. For the most part this is great, but it doesn’t do much to encourage me to stick with reading one thing at a time. Consequently, I have been jumping around a lot, reading a bit of everything.


    § I got beta access to Google’s new generative AI search features. The UI feels a bit busy and confusing, particularly on mobile, but the overall functionality is actually better than I expected.

    Unfortunately, it only works in Chrome and the Google iOS app right now. I can’t wait for it to come to Safari.


    § I am getting dangerously close to becoming a mushroom forager. I’ve got the books.


    § Links

  • Last month, when the viral AI-generated “Drake” song was blowing up, the musician Grimes told fans that she would split royalties “on any successful AI generated song that uses my voice.”

    I am honestly surprised by the extent that she followed through with this.

    Grimes subsequently released software designed to assist in the generation of these songs—“If you go to elf.tech u can upload ur voice singing or record directly into the app… It will output the same audio but with my voice.”

    She recently spoke with Joe Coscarelli at The New York Times about some of the music that has been produced so far.

    Grimes:

    People keep getting really upset, being like, “I want to hear something that a human made!” And I’m like, humans made all of this. You still have to write the song, produce the song and sing the vocal. The part that is A.I. is taking the harmonics and the timbre of the vocal and moving them to be consistent with my voice, as opposed to the person’s original voice. It’s like a new microphone.

  • Sam Altman has been spending the past few weeks advocating in favor of AI regulations.

    OpenAI:

    In terms of both potential upsides and downsides, superintelligence will be more powerful than other technologies humanity has had to contend with in the past. We can have a dramatically more prosperous future; but we have to manage risk to get there.

    […]

    We are likely to eventually need something like an IAEA for superintelligence efforts; any effort above a certain capability (or resources like compute) threshold will need to be subject to an international authority that can inspect systems, require audits, test for compliance with safety standards, place restrictions on degrees of deployment and levels of security, etc.

    To be fair, they say open source models are totally fine… as long as they don’t get too good:

    We think it’s important to allow companies and open-source projects to develop models below a significant capability threshold, without the kind of regulation we describe here… the systems we are concerned about will have power beyond any technology yet created, and we should be careful not to water down the focus on them by applying similar standards to technology far below this bar.

    Last week, Altman was in Washington DC discussing these topics with lawmakers.

    Cat Zakrzewski, The Washington Post:

    OpenAI chief executive Sam Altman delivered a sobering account of ways artificial intelligence could “cause significant harm to the world” during his first congressional testimony, expressing a willingness to work with nervous lawmakers to address the risks presented by his company’s ChatGPT and other AI tools.

    Altman advocated a number of regulations — including a new government agency charged with creating standards for the field — to address mounting concerns that generative AI could distort reality and create unprecedented safety hazards.


    In October, 2022 Sam Bankman Fried published a “draft of a set of standards” for the cryptocurrency industry. He had previously been spearheading the effort to lobby congress to adopt similar regulatory measures industry-wide.

    Less than one month later, his exchange, FTX declared bankruptcy. He was subsequently indicted with “wire fraud, conspiracy to commit commodities fraud, conspiracy to commit securities fraud and conspiracy to commit money laundering” among other charges.


    Mike Masnick, Techdirt:

    It’s actually kind of typical: when companies get big enough and fear newer upstart competition, they’re frequently quite receptive to regulations… established companies often want those regulations in order to lock themselves in as the dominant players, and to saddle the smaller companies with impossible to meet compliance costs.

    OpenAI should be commended for kickstarting our current generative AI development explosion and they are still, without question, the leader in this space.

    This move should be called out for what it is, though—a blatant ploy for regulatory capture.

  • It is hard to deny that Google killed it during their IO conference earlier this month. They were clearly kicked into action by AI developments spearheaded by Microsoft and OpenAI and it showed.

    Well, Microsoft held their annual Build conference yesterday. How did they respond?

    With a whimper.

    Microsoft’s close partnership with OpenAI might be their smartest move in recent memory and they are squandering it with a complete lack of any coherent product vision.

    Their big announcement was Copilot for Windows—a button on the Windows 11 taskbar that opens up what appears to be a Bing AI web view. Sure, Microsoft made sure to note that Copilot will be able to “customize the settings” on your PC although I am sure you will still get thrown into control panel if you need to accomplish anything substantial.

    The only other notable announcement is that “Browsing with Bing” will soon be the default ChatGPT experience and that ChatGPT plugins will soon be compatible with Bing AI.

    It isn’t a secret that Bing AI and ChatGPT share the same underlying model from OpenAI. And, unlike Google’s new generative AI augmented search, Microsoft didn’t put any thought into what a meaningful user experience for an AI assisted Bing would look like.

    It is just a chat window, just like ChatGPT.

    I don’t understand why I should want to use Bing AI instead. I don’t think Microsoft knows why, either.

    So Build was boring. Maybe Satya is just happy to have made Google dance. But Google is running now. They haven’t caught up yet but the gap is quickly closing.

  • The anonymous writer behind the brr.fyi blog has been writing about their experience living and working at an Antarctic research station.

    On May 12th, the sun passed below the horizon. It will be dark for the next eight months. From now until August, there will be no visitors and no resupply trips—during the long arctic winter the environment is too harsh for safe air travel. Everyone is truly isolated.

    Oh, and when you decide to venture outside, everything is red:

    There are a number of science projects that can only happen during the winter here, because of the unique environmental characteristics of the South Pole (darkness, elevation, weather, air quality, etc). A few of these are extremely sensitive to broad-spectrum visible light.

    […]

    To protect the science experiments, we work hard to avoid any stray broad-spectrum light outside. This means all our station windows are covered, all our exterior navigation lights are tinted red, and we’re only permitted to use red headlamps and flashlights while walking outside.

    Once it becomes closer to fully dark, these lights take on a surreal quality.

    Make sure you check out the photos and videos the author shared. Surreal doesn’t even begin to describe it.

  • The New York City Public School system has decided to reverse its ban on ChatGPT.

    David C. Banks, Chancellor of the New York City Department of Education:

    In November, OpenAI introduced ChatGPT to the public, unleashing the power of generative artificial intelligence and other programs that use vast data sets to generate new and original content. Due to potential misuse and concerns raised by educators in our schools, ChatGPT was soon placed on New York City Public Schools’ list of restricted websites.

    […]

    The knee-jerk fear and risk overlooked the potential of generative AI to support students and teachers, as well as the reality that our students are participating in and will work in a world where understanding generative AI is crucial.

    […]

    While initial caution was justified, it has now evolved into an exploration and careful examination of this new technology’s power and risks.

    New York City Public Schools will encourage and support our educators and students as they learn about and explore this game-changing technology while also creating a repository and community to share their findings across our schools. Furthermore, we are providing educators with resources and real-life examples of successful AI implementation in schools to improve administrative tasks, communication, and teaching.

    Anecdotally, I have heard Snapchat’s My AI feature was a turning point. After a rocky start, it brought generative AI to an app nearly every teenager already has installed on their smartphone.

    Even without Snapchat, Google and Microsoft are currently in the process of integrating generative AI into Docs and Word. Once these features fully roll out, a probation on language models will lead to students cheating without ever intending to. Imagine failing a student for using autocorrect.

    The cat is out of the bag.


    Previously:

  • § My birthday was on Tuesday!


    § I finally caved and bought myself a reMarkable 2 tablet. Some initial thoughts after a few days of use:

    I have previously used the iPad with an Apple Pencil which is still undeniably the best touchscreen experience available today. It is also unmistakably digital. I spent a lot of upfront time fruitlessly configuring it to limit outside distractions. It became an attractive nuisance more than a tool.

    The reMarkable, on the other hand, feels like “magic paper.”

    Most of the time, it is just like writing in a notebook. But then you remember you can undo a mistake or duplicate and move a shape—it is paper+. It is by no means perfect—there are still things like accidental input that immediately break the illusion—but it is good. Most importantly, it isn’t fiddly. There are no apps to install, no settings to tweak. You can read, write, and sketch. That is it.

    Unless there are particularly meaningful iPadOS updates come WWDC, I plan to sell my iPad Pro.


    § Six years after Breath of the Wild, its sequel, Tears of the Kingdom, was finally released last week.

    The first game was easily my favorite Nintendo Switch game so I have been nervously anticipating this follow-up for quite a while.

    Honestly, I would be happy even if Tears of the Kingdom were simply an updated version of BOTW with an expanded map. Tears is so much more than that, though. The new construction mechanic, alone, opens up so many new opportunities for play and experimentation.

    The new map is enormous, too. It takes all of the familiar locations from the previous game and expands them both underground with a giant network of caves and through the air with a series of sky islands—“skylands.”

    Here is the bottom line: BOTW is a game that is worth purchasing a Switch for. I can say the same thing about Tears, without hesitation.


    § I didn’t realize scallions were just the greens from normal onion plants that you pull from the ground early. Well, the Spanish onion bulbs that I planted a few weeks ago are now providing me with an effectively unlimited supply of them.


    § I’ve caught a cold. But at least, with Zelda and the reMarkable, there have been much worse times to be stuck at home for a little bit.


    § Links

    § Recipes

    • I made baked ziti for the first time in quite a while. It is good fresh out of the oven but it is the king of leftovers. It might be twice as good the next day.
    • In some sickness induced delirium I started really craving buttery dinner rolls. I found a gluten free recipe that was quick to make and turned out amazingly well.
  • David Pierce tells the story of Google’s AMP initiative:

    After a decade of newspapers disappearing, magazine circulations shrinking, and websites’ business dwindling, the media industry had become resigned to its own powerlessness. Even the most cynical publishers had grown used to playing whatever games platforms like Google and Facebook demanded in a quest for traffic…

    “If Google said, ‘you must have your homepage colored bright pink on Tuesdays to be the result in Google,’ everybody would do it, because that’s what they need to do to survive,” says Terence Eden, a web standards expert and a former member of the Google AMP Advisory Committee.

    […]

    AMP succeeded spectacularly. Then it failed. And to anyone looking for a reason not to trust the biggest company on the internet, AMP’s story contains all the evidence you’ll ever need.

    It seems important that this reporting is coming from The Verge, a publication that was dramatically redesigned last year to shift focus away from external platforms.

    Introducing the redesign, editor-in-chief Nilay Patel wrote:

    Our goal in redesigning The Verge was actually to redesign the relationship we have with you, our beloved audience. Six years ago, we developed a design system that was meant to confidently travel across platforms as the media unbundled itself into article pages individually distributed by social media and search algorithms.

    […]

    But publishing across other people’s platforms can only take you so far. And the more we lived with that decision, the more we felt strongly that our own platform should be an antidote to algorithmic news feeds, an editorial product made by actual people with intent and expertise


    As an aside, when Safari Web Extensions for mobile launched with iOS 15, various “AMP blocker” utilities immediately became hugely popular. I have had one installed ever since. If you don’t, do yourself a favor.

  • Chris Wiley, The New Yorker:

    The artist Charlie Engman is one of the few photographers who have leaned into the alien logic of the new machine age and found a way to make something that feels new.

    […]

    Engman as an A.I. artist is dizzyingly prolific. “The amazing thing about A.I. is that I can make, like, three hundred pictures a day,” he told me, “And every single one of them can be an entirely different set of characters, and new location, and new material. I’m not constrained by physical reality at all.”

    We need a better way to describe AI artwork than “photography.” It is a collage where each element is, at the same time, bespoke and not under your control. It is a new medium. Our conversations will improve once we acknowledge it as such.

    In this context, it is clear that the artistic tools used to create within this new medium are painfully immature. Engman’s “characters” are fleeting apparitions. Any minute tweak to your seed number, token choice, or model architecture, and they are gone forever.

    We should be able to recompose a frame, resituate characters in a different environment, give them a haircut, and change their wardrobe, pose, and expression. Characters should have a history. Maybe one is afraid of heights. That should be remembered next time you pose them crossing a mountain overpass. The ways that characters relate to each other and their environment should have a consistent logic while continuing to allow for surprising choices on the part of the model.

    Technologies such as ControlNet could begin to help here, but they are still early and only accessible to those with a technical background.

    Or maybe this is all the wrong attitude to take. Maybe treating AI artwork as a new medium means embracing a certain loss of control.

    It is too early to say.

  • To celebrate Global Accessibility Awareness Day, Apple announced a handful of forthcoming iOS features. One of these features really caught my eye because I think it gives us a tiny glimpse at what Apple’s platforms might look like in the future:

    Coming later this year… those at risk of losing their ability to speak can use Personal Voice to create a synthesized voice that sounds like them for connecting with family and friends.

    With Live Speech on iPhone, iPad, and Mac, users can type what they want to say to have it be spoken out loud during phone and FaceTime calls as well as in-person conversations… For users at risk of losing their ability to speak — such as those with a recent diagnosis of ALS… Personal Voice is a simple and secure way to create a voice that sounds like them.

    Users can create a Personal Voice by reading along with a randomized set of text prompts to record 15 minutes of audio on iPhone or iPad. This speech accessibility feature uses on-device machine learning to keep users’ information private and secure, and integrates seamlessly with Live Speech so users can speak with their Personal Voice when connecting with loved ones.

    Apple’s custom silicon expertise currently gives them a huge advantage when it comes to training and running personalized machine learning models locally on customer’s devices.

    I would love to see a “Siri 2.0” that utilizes on-device language models. However, as we get closer to WWDC it has gotten increasingly clear that this is not the year for that. This year will almost certainly be dominated by an unveiling of their rumored XR headset. But even setting the headset aside, Apple tends to be very slow and methodical when it comes to large changes like the ones that would be required by a huge Siri revision.

    Nevertheless, I expect to see more incremental progress towards expanding on-device AI models throughout all of Apple’s platforms. We just might have to wait until iOS 18 or 19 for the big stuff.

  • Douglas, writing for the blog A Mindful Monkey, shares some observations on the task of raising a newborn:

    She is very curious

    I already pretty much know how the world works. I don’t need to create mental models of what happens when a rubber duck gets put in a cup and taken out x 100 times like she does. I don’t have a desire to poke my finger into the ethernet port. I can’t imagine having to start from ground zero and develop an understanding of everything, but I guess I did it once and so does she.

    […]

    Within the last month, she seems to have created the concept of ‘handle’ in her mind.

    Instead of grabbing objects wherever her hand lands, she reaches for parts that are best fit to grab; even if the object is new to her. The concept of a handle seems pretty innate to me and I didn’t notice it until watching my daughter. It’s just instinctual that things have handles (or at least the best ways to grab them). It’s like she has developed a sense of physics (gravity, torque, etc) without actually knowing what those this are at all.

    One more, from a follow up post:

    One time I laid down to see what her mobile looked like from her perspective when she was an infant. It looked drastically less cool from her vantage point. She has since grown and is now at knee height. I wonder about how different every room feels from that height. The kitchen is towering over you, with no idea of what is going on up there. A refrigerator so tall you can only see the bottom row of food. You can never open a door.

  • Just last Monday I was commending Mosaic for their “StoryWriter” language model with its huge 65,000 token context window. In contrast, OpenAI’s forthcoming 32,000 token large context version of GPT-4 no longer looks so impressive.

    Well…

    Anthropic:

    We’ve expanded Claude’s context window from 9K to 100K tokens… The average person can read 100,000 tokens of text in ~5+ hours, and then they might need substantially longer to digest, remember, and analyze that information. Claude can now do this in less than a minute.

    […]

    Beyond just reading long texts, Claude can help retrieve information from the documents that help your business run. You can drop multiple documents or even a book into the prompt and then ask Claude questions that require synthesis of knowledge across many parts of the text.

    With Google’s announcements last week and Anthropic’s steady stream of improvements, I am beginning to wonder what OpenAI has up its sleeve.

  • § Now that we’ve finally had some nice weather, I have been able to measure that, on sunny days, the greenhouse is consistently 15-20 °F warmer than the average outdoor temperature.

    The next step is to work on insulation since, right now, all of that extra warmth dissipates almost immediately once the sun sets.


    § I’ve started reading—or, more accurately, listening to—Corey Doctrow’s new book Red Team Blues. Aside from a vague self-righteousness from the protagonist that rubs me the wrong way, I have found the book unbelievably fun so far.

    In the past, dry audiobook recordings have made it hard for me to absorb content as fully as I would if I were reading. Wil Wheaton’s energetic and unorthodox performance here was just what I needed. I can’t wait to finish it.


    § I finally got access to the ChatGPT Code Interpreter plugin which gives ChatGPT the ability to execute the code it writes. It is all extremely impressive.

    In one of my first tests, ChatGPT initially wrote buggy code that resulted in a runtime error and it was able to detect the error, fix the bug, and re-execute the program all without any intervention on my part.


    § We had ten more yards of compost delivered on Monday, after going through five less than a month ago.

    There is something meditative about filling a wheelbarrow by the shovelful while you watch the pile steadily shrink. It is a great way to get a little bit of exercise and a lot of sunlight. Caroline and I spent our afternoons doing just that, dispersing more than half of it throughout the garden beds before the weekend.


    § Speaking of the weekend… Caroline and I visited Fallingwater on Saturday! The guided tour was certainly worth it but the best part was freely wandering the grounds afterwards. There is a reason the house was built in that location; it is gorgeous.


    § Oh, and I got engaged!


    § Links

  • Amy Goodchild published a comprehensive primer on the emergence of computer art in the 1950s and 60s. It is inspiring to see the depth of what artists were able to create using primitive, inaccessible, and expensive technologies compared to what is available today.

    Artists have a long history of pushing the boundaries of what is possible. In recent years, the space of possibilities has expanded dramatically. I hope there are creative pioneers working today to push us even further.

  • Google held their annual IO developer conference yesterday and the theme was, unquestionably, artificial intelligence.

    Below are a few quick thoughts on what I think are the most important announcements.

    Bard & PaLM 2

    Bard is no longer behind a waitlist and the underlying model has been updated to PaLM 2.

    PaLM 2 comes in five different sizes. The smallest model, named Gecko, is optimized to run locally on mobile devices. Google didn’t specify which size is currently behind Bard.

    Like ChatGPT, “tools” are coming to Bard. Integrations with first-party Google services will be available first. Later, third-party developers, including Instacart, Adobe Firefly, and Wolfram Alpha, will have plugins available.

    “In the next few weeks” Bard will become more multimodal. It will be able to output images and will accept images as prompt inputs, similar to the (not yet available) GPT-4 multimodal capabilities.

    A new larger and more powerful model, Gemini, is now is training and will be added to Bard at some point in the future.

    Generative AI answers are coming to Search—Google will be adding an “AI Snapshot” section above its traditional search listings.

    Ads will be shown above the AI Snapshot box and products from Google’s Shopping Graph will be suggested as a part of its answers, when relevant.

    You will be able to engage in a freeform back-and-forth chat to elaborate on generated answers. This might be the closest anyone has come to the UI I suggested back in January.

    These new features will be available as a part of the Search Labs program “in the coming weeks.”

    It should be noted that this is not the “Google Inbox Strategy” I was optimistic about a few weeks ago. These features will be coming to Google proper—not some new experimental “Magi” interface. This a bold move; time will tell if it is the correct one.

    Tailwind

    Although it is just a “prototype,” Tailwind might be one of the first true AI-native applications I’ve seen. Google describes it as an “AI-first notebook.” Out of everything announced yesterday, it is Tailwind that we know the least about. If it is done correctly, though, it could be an extremely compelling product.

    This is the type of weird, one off, Inbox-style experiment I want to see more of.

    Sidekick

    In what is clearly a response to Microsoft’s new 365 Copilot feature, Google previewed “Sidekick.” In Google Docs and other Workspace applications there will be an AI-powered side panel that will offer suggestions based off on information from all of your Workspace documents. I think Copilot is a great idea and there is no reason to think this will be any different.

    Google Assistant

    Notable in that there was absolutely no mention of it at the conference.


    I am sure I will have more to say about all of this once these features become public. For now, it is evident that Google feels significant pressure from Microsoft and does not intend to go down without a fight.

  • The Machine Learning Compilation blog:

    Significant progress has been made in the field of generative artificial intelligence and large language models… As it stands, the majority of these models necessitate the deployment of powerful servers to accommodate their extensive computational, memory, and hardware acceleration requirements.

    […]

    MLC-LLM [is] a universal solution that takes the ML compilation approach and brings LLMs onto diverse set of consumer devices… To make our final model accelerated and broadly accessible, the solution maps the LLM models to vulkan API and metal, which covers the majority of consumer platforms including windows, linux and macOS… Finally, thanks to WebGPU, we can offload those language models directly onto web browsers. WebLLM is a companion project that leverages the ML compilation to bring these models onto browsers.

    Their iOS app is powered by the Vicuna 7B language model. I was genuinely shocked by the inference speed on my iPhone 14 Pro. The response quality is roughly equivalent to MPT, StableLLM, and other similar open source projects—in other words, not particularly great. But, again, all of this is running locally on a phone—that is a truly impressive feat.

    One of the example use-cases from the linked announcement is a bespoke AI assistant that is trained on each individual user’s private data. Now, this personalized assistant should run locally for privacy and security reasons but it doesn’t have to be particularly powerful as long as it has the ability to offload difficult tasks to a more powerful, centralized assistant in a privacy preserving manner.

    A pattern very similar to Simon Willison’s recent proposal for “Privileged” and “Quarantined” LLMs would be key here.

    In this scenario, it is less important for local models to be powerful than it is for them to be fast and energy efficient. MLC could be a step towards making this a reality.

  • AI researchers from Google’s DeepMind team used reinforcement learning to teach simulated robots how to play soccer. They then managed to transfer those learned abilities onto real, physical robots. We are left with not only impressive research, but a bunch of videos of super cute, tottering robots playing soccer with each other.

    DeepMind:

    Our agents, with 20 actuated joints, were trained in simulation using the MuJoCo physics engine, and transferred zero-shot to real robots… The trained soccer players exhibit robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more… The agents also developed a basic strategic understanding of the game, learning to anticipate ball movements and to block opponent shots.

    As I mentioned above, the linked page is full of videos that are all fascinating and impressive; if nothing else, don’t miss the clip where a researcher repeatedly pushes down an adorable, tiny, humanoid robot as it runs around playing with a soccer ball. Her discomfort with each push is palpable.

subscribe via RSS