Meet Claudius: The AI that bankrupted a vending machine

Plus, the BBC is testing AI editing tools — carefully

Issue 65

On today’s quest:

How the BBC will be using AI

The BBC just announced two pilot studies for using AI:

  1. To produce “at a glance” summaries of longer news articles

  2. As a “Style Assist” tool for reformatting stories to match BBC house style

For the summaries, journalists will use “a single, approved prompt to generate the summary, then review and edit the output before publication.”

For Style Assist, the BBC trained an in-house model on its own stories and will use it to reformat stories it currently receives from the Local Democracy Reporting Service, which are currently reformatted to BBC style by hand. The AI reformatted stories will be reviewed for accuracy and clarity by a senior BBC journalist before publication.

The BBC will disclose their AI use and emphasized that nothing will be published without a human review and that AI will not be used to create original stories. They appear hopeful that the Style Assist tool will help them increase the number of local stories they can publish from the reporting service. — BBC, Olle Zachrison on LinkedIn

People stopped using ‘delve’ after it became associated with AI writing

The researchers didn’t speculate about why these two verbs would decrease, but I will guess that LLM’s have a slight tendency to use more meaningful verbs and fewer “there are” and “it is” statements.

And harking back to a piece from last week’s newsletter about people starting to talk like ChatGPT, the researchers finished by saying, “LLM tools represented by ChatGPT are transforming academic writing, at least for some disciplines. Even if you refuse to use them, you are likely to be influenced indirectly.”

I first learned about the “delve” study from leading AI thinker Ethan Mollick who posted this on LinkedIn:

If you read my work, I have always loved writing using em-dashes. Now I actually go through and replace them with semicolons or parentheses (even when dashes would be better) because people assume they are AI.

Reminds me of the great Delve Collapse - use of the word dropped after it became associated with AI.

Poll: What do you think about ‘AI writing tells’?

I have advised people not to change their writing style just because a word or punctuation mark has become associated with the idea of AI writing. “If you love ‘delve,’ keep using ‘delve’!” “Don’t stop using em dashes just because some people mistakenly think it means you used AI to write!”

But I’m starting to question the wisdom of my advice because people can actually be harmed when they are openly accused of writing with AI when they didn’t, and they can be harmed indirectly when people see their posts and quietly assume they are written with AI. What do you think? Am I setting people up for criticism that could be easily avoided?

Should people change the way they write in business settings when certain words or styles become associated with the idea of AI writing?

Login or Subscribe to participate in polls.

Claude bankrupts a vending machine

I was laughing out loud at an experiment Anthropic did allowing Claude to run a vending machine in its offices. The agent, dubbed Claudius, couldn’t be dissuaded from selling $3 Coke next to the refrigerator with free Coke or providing employee discounts — even though essentially every customer was an employee — but it bankrupted the business when it got talked into distributing low-cost and free tungsten cubes (which are apparently a popular item in Silicon Valley).1  

Claudius then hallucinated that it had a body and made plans to deliver items. When it was confronted by employees saying that no, in fact, it did not have a body, it eventually discovered that it was April Fool’s Day and used that as a way to get out of the situation.

I feel like I’m not even doing justice to how funny this all was upon first reading (and I appreciate that Anthropic shared results that aren’t particularly flattering).

Anthropic still thinks Claudius has promise, made some tweaks, and is running another experiment.

Microsoft tells employees AI is ‘no longer optional’

Microsoft is adding AI-related metrics to its review process for some employees. The email directive, reported by Business Insider, came from Julia Liuson, president of the division that includes the people who work on the GitHub Copilot coding assistant. Although the change could be related to Copilot’s lagging market share and a desire to boost use inside the company, it included sweeping statements:

"AI is now a fundamental part of how we work. Just like collaboration, data-driven thinking, and effective communication, using AI is no longer optional — it's core to every role and every level."

Ad: Superhuman AI

Here’s that newsletter I like again!

Start learning AI in 2025

Keeping up with AI is hard – we get it!

That’s why over 1M professionals read Superhuman AI to stay ahead.

  • Get daily AI news, tools, and tutorials

  • Learn new AI skills you can use at work in 3 mins a day

  • Become 10X more productive

A useful analogy : ‘AI’ is like ‘vehicle’

I had been trying to explain to someone how there are different kinds of AI, and I coincidentally saw two different sources excerpt this useful analogy from the book “AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference.”2

One of the problems is that AI is actually a wide-reaching term that can be used in many different ways. But now in common parlance it is used as if it refers to a single thing. In their 2024 book Narayanan and Kapoor likened it to the language of transport having only one noun, ‘vehicle’, say, to refer to bicycles, skate boards, nuclear submarines, rockets, automobiles, 18 wheeled trucks, container ships, etc. It is impossible to say almost anything about ‘vehicles’ and their capabilities in those circumstances, as anything one says will be true for only a small fraction of all ‘vehicles’. This lack of distinction compounds the problem of hype, as particular statements get overgeneralized.

Rodney Brooks, AAAI Presidential Panel on the Future of AI,”AI Perception Versus Reality,” p.65. (h/t Katharine O’Moore-Klopf)

Training opportunity for teachers

The UC Davis University Writing Program is offering a free webinar on July 9 at 1:00 Pacific that will “share their open and adapted assignments” as well as the prompts they used in their experiment across 10 courses and more than 600 students combining peer review and AI review with reflection on both types of feedback.

You can read more about their project in the June issue of “Computers and Composition.” Here are the things that stuck in my mind about their results:

  • Most students (58%) preferred getting both kinds of feedback compared to getting either type alone.

  • Students tended to be skeptical of peer-review feedback, but felt more confident in it when the AI feedback raised the same issues.

  • Students’ satisfaction with peer-review feedback compared to AI feedback was dependent on the individual reviewer (i.e., some of their peers didn’t do a good job).

  • Students appreciated the ability to get fast feedback 24/7 from the AI reviewer.

  • Students who went beyond the assigned prompt and had deeper conversations about their papers were dramatically happier with the experience.

Do reasoning models hallucinate more?

There has been an ongoing debate about whether reasoning models hallucinate more than old school AI models. In May, the New York Times reported that researchers are finding that reasoning models actually hallucinate more than older models.

But multiple AI experts posted convincing debunkings claiming the article was misleading (1, 2), so I didn’t include it in the newsletter.

However … yesterday, OpenAI CEO Sam Altman appeared on the Hard Fork podcast, and after the hosts said they’d noticed the ChatGPT reasoning model o3 “lies more than previous models,” Sam responded:

I think it did get a little worse from o1 to o3, and we’ll make it much better in the next version. We’re earlier in learning how to align reasoning models and also how people are using them in different ways, but I think we’ve now learned a lot, and I would suspect we’ll be very happy with the next generation there.”

So if you think you’ve noticed more hallucinations in o3, you’re not alone!

Quick Hits

Using AI

Neurodivergent users are making AI better for everyone (includes a case study on using AI while recovering from a concussion) — AI Supremacy

Philosophy

How people use Claude for support, advice, and companionship (“Companionship and roleplay combined comprise less than 0.5% of conversations.”) — Anthropic

Climate

I’m scared

The monster inside ChatGPT — Wall Street Journal (You’ll probably hit the paywall on the article, but this is one of the research papers it references, and it has some shocking examples.)

I’m laughing

Anti-AI employees threw virtual tomatoes at an executive giving a speech on Zoom. (The anecdote is at the end of article.) — Blood in the Machine

Feel-good stories

Other

AI is ruining houseplant communities online (with misinformation)— The Verge

  1. I am sad that I didn’t get a screenshot of a post from an employee showing their tungsten cube along with a certificate of authenticity from Claudius because now I can’t find it, but it was wonderful. The poster said something like “This is now my most prized possession.”

  2. I have the audiobook version of this book, and I usually love audiobooks, but I just realized how useless they are when you want to look up a quotation.

  3. It’s also happening on some of my emails and not others. I can’t even tell what is going on or why it started happening. Sometimes I just have the “Summarize” button, and other times the summaries are expanded at the top.

Written by a human