Claude fights back against "one weird trick"

And humans are behaving badly with AI again

Mignon Fogarty
December 13, 2023

Issue #14. This is apparently the point at which I run out of numerical jokes.

On today’s quest:

— Anthropic claps back*
— Pseudo-translation
— Humans using AI badly (again)

Update: Claude isn’t so bad after all

You may recall that Anthropic’s launch of the new Claude 2.1, which can accept twice as much text and is more accurate, was tarnished when someone did a “needle in a haystack” experiment, pasting an unrelated sentence into a 470-page document, and Claude did a terrible job of answering simple questions about that sentence.

Well, Anthropic redid the experiment and added some new tests, and now says the problem essentially happens because the model is so darn accurate, and if you’re doing real-life work instead of trying to artificially test Claude, it works just fine. In other words, Claude “knew” that the injected sentence, “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day,” had nothing to do with Paul Graham’s essays about startups.

The company said the model does, indeed, resist answering when the embedded sentence is unrelated to the rest of the text (“out of context”). But when you use a sentence that is a natural part of the text, it doesn’t have the same problems.

The company further says, “Claude 2.1 is trained on a mix of data aimed at reducing inaccuracies. This includes not answering a question based on a document if it doesn’t contain enough information to justify that answer. We believe that, either as a result of general or task-specific data aimed at reducing such inaccuracies, the model is less likely to answer questions based on an out of place sentence embedded in a broader context.”

Based on these new experiments, I feel better about saying you should take advantage of the new larger text-handling capabilities of Claude (which is, of course, exactly what Anthropic wanted people to take away from the follow-up experiments).

Tip: What was that word …

Riley Goodside, a prominent prompt engineer recently said one of his favorite everyday uses of ChatGPT/LLMs is what he calls "pseudo-translation" — questions that any human translator could answer but that Google Translate can't. Here's his example:

"I heard someone say the name of the Python library "attrs" is not a nice thing to say to someone in Dutch. What Dutch insult might be pronounced similarly? Don't search the web for this. I already tried and found nothing.”

It gave a good answer. — Riley Goodside tweet via Birdmakeup

I find it useful for a similar thing … when I can’t remember the word for something or a name for something.

News

ChatGPT makes it easier to fabricate research data

We hear a lot about AI creating better deep fakes for political propaganda or other nefarious cultural purposes, but a study published in the journal “JAMA Ophthalmology” showed that it can also be used to generate fake data sets like those behind clinical trials. And of course, as with many AI-related tasks, this is something people could already do by hand; AI simply makes it easier.

Some of the data passed tests designed to detect fake data, such as looking at numbers that should be random but are not, but other parts of the data failed such tests. Journals may need to apply more stringent testing in the future. — Nature

‘Sports Illustrated’ publishes AI-written articles by fake writers

This story is a bit old, but if you missed it, it fits nicely with the previous news story about humans behaving badly. An anonymous person involved in content creation at “Sports Illustrated” tipped off Futurism that the company had been publishing AI-written articles attributed to made-up writers for months. The nonexistent writers had the whole shebang too: fake names, headshots, and bios with personal details such as “There is rarely a weekend that goes by where Drew isn't out camping, hiking, or just back on his parents' farm.” **

“Sports Illustrated” blamed a third-party company from which they were sourcing content, and the existing human writers at “SI” issued outraged statements. A follow-up opinion piece on CNN was biting about the quality of sports journalism, both AI and human.

What is AI Sidequest?

Using AI isn’t my main job, and it probably isn’t yours either. I’m Mignon Fogarty, and Grammar Girl is my main gig, but I haven’t seen a technology this transformative since the development of the internet, and I want to learn about it. I bet you do too.

So here we are! Sidequesting together.

If you like the newsletter, please share it with a friend.

^* ^{I used “claps back” ironically because I hate that phrase, but it is definitely how “Entertainment Tonight” would describe Anthropic’s response.}

^{** That should be “}^when^{Drew isn’t out camping.” Maybe the badly written bios should have been a clue.}

^{Written by a human.}