Don’t use the wrong AI for the job

Plus, what if “tapestry” keeps you out of college?

Issue 66

On today’s quest:

— Which type of model should you use?
— Getting the most out of deep research
— LLMs fail at fact checking scientific papers
— Cloudflare is blocking AI crawlers by default
— The AI doctor is in … and kicking butt
— Your thoughts on changing your writing

Which type of model should you use?

Researchers at Apple’s Machine Learning Research division have a new piece that compares the newer reasoning models (like ChatGPT o3, DeepSeek-r1) to the old-school LLMs (like ChatGPT 4o), and found the following:

  • LLMs do better on simple tasks (e.g., ChatGPT 4o, Claude 3.7 Sonnet)

  • Reasoning models do better on moderately complex tasks (e.g., ChatGPT o3, Deepseek-r1)

  • Both do a bad job on very complex tasks.

The researchers tested each model with four different puzzles that mostly involved moving pieces around, and in most cases, they could increase the complexity of the puzzles by adding pieces to them. There was a threshold of adding pieces after which all the models failed.

These tests probably aren’t like the work you are going to be doing with AI, but it’s another data point that suggests that if you have a simple task, you’ll get a better answer if you use a non-reasoning model.

Also, if you have a task that is extremely complex, with lots of steps, you’ll want to try a reasoning model, but you’ll likely get better results if you can break it up into smaller tasks.

How to work with AI: Getting the most out of Deep Research

Torsten Walbaum at Operator’s Handbook put together a detailed “how-to” for using different models for deep research on tasks like trying to understand new topics, doing product and vendor comparisons, helping you decide whether to form a company from your side hustle, and so on.

He highlights all the places the models need extra hand holding to get the best results, compares each model so you can choose the best one for your particular task, and gives advice on good prompting so you don’t accidentally waste research credits since each system limits the number of queries to these energy-intensive products.

If you want to use AI for deep research, I highly recommend reading the whole thing.

LLMs fail at fact checking scientific papers

I’ve previously recommended LLMs for fact checking and continue to use them in that way, but here’s a reminder that they shouldn’t be your only method:

When fed 83 published papers with known errors, the best any LLM did — ChatGPT o3 — was to detect 21% of them. This is also an important result because some journals are starting to look into using LLMs as part of peer review. The researchers did not test how well humans would do fact checking the papers. arXiv

The Future of AI in Marketing. Your Shortcut to Smarter, Faster Marketing.

This guide distills 10 AI strategies from industry leaders that are transforming marketing.

  • Learn how HubSpot's engineering team achieved 15-20% productivity gains with AI

  • Learn how AI-driven emails achieved 94% higher conversion rates

  • Discover 7 ways to enhance your marketing strategy with AI.

Cloudflare is blocking AI crawlers by default

Backend internet giant Cloudflare announced Tuesday that going forward, it will block AI crawlers by default. About 16% of the world’s internet traffic goes through Cloudflare servers. Further, the company plans to set up a Pay Per Crawl program where customers can charge AI companies for permission to scrape their websites.

This seems like it could be a big deal. Cloudflare CEO Mathew Prince said, “AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators, while still helping AI companies innovate.”CNBC

However, people are also concerned that the Pay Per Crawl program could create problems for open source crawling projects like Common Crawl, which reminds me of a discussion I had with Erin McKean about the way anti-AI initiatives could harm language researchers:

The AI doctor is in … and kicking butt 

In a recent test, a Microsoft tool using OpenAI’s o3 reasoning model correctly diagnosed 85% of medical cases, whereas doctors only got 20% right.

Although the results are impressive, it’s important to note that the doctors were at a disadvantage because they were asked not to use outside sources, which is something they often do in practice — talking to colleagues, doing their own web research, etc.

Also, because these were cases taken from the New England Journal of Medicine, they were more complicated than cases usually seen by most physicians. The head researcher described them as "some of the toughest and most diagnostically complex" a physician can face. — MSN

Your thoughts

Thank you for taking the poll in the last newsletter about whether people should change their writing in business settings when things like the em dash or the word “delve” become associated with AI writing. Here are the results:

And here are your comments:

From people who voted “no”:

💡 “I flatly refuse to have grammar and language use be driven by nascent alpha-test level usage of computer tools.”

💡 “I don't think they should change the way they write in business settings. LLMs are evolving all the time, and if we change every time a certain word becomes ‘AI trendy,’ we will drive ourselves nuts. My approach---keep writing like I always do.”

💡 “I don’t want to dumb down my writing just because other people might make wrong assumptions.”

💡 “No because it will take more than the rise of AI to steal my beloved em dashes.”

From someone who voted “yes”:

💡 “If you're writing like an AI, you're probably writing badly - or at least derivatively. Why not at least pretend to care about your work?”

From people who voted “it’s complicated”:

💡 “I'm one of the people who railed against the sudden denigration of the em-dash. That was before I read your post about doing that feeding the controversy. I think it's complicated because changing the way people write can create growth and improvement. But that doesn't mean they should stop using perfectly good punctuation marks. Instead, writers should educate the public. Which goes back to feeding controversy. It's complicated.”

💡 “My first instinct is the same as yours -- don't change your style! -- but the possibility of negative real-world consequences has to be seriously considered.”

💡 “If you’re a sucky writer and ChatGPT is essentially mimicking your bad writing habits (endless lists of 3? chronic repetition? boring-as-shit sentence structures?), you should work to change them. If you’re a good writer who likes an em dash, do what you like!”

I’m still worried

I appreciate your feedback. I continue to think about this question. For example, here’s a quote that gives me great pause from a college essay consultant in a piece about kids using ChatGPT to write their college essays, with an emphasis on words that have become AI tells: 

Not getting admitted to college is real harm, and I don’t doubt that this will happen given how narrow these decisions can be.

Right now, I’m thinking I will tell people what I am doing (not changing), make sure they understand the risks, and let them make up their own minds. In other words, I’m softening my stance a bit.

Quick Hits

Using AI

Philosophy

I’m scared

I’m laughing

Other

Embarrassing AI errors

Job market

What is AI Sidequest?

Are you interested in the intersection of AI with language, writing, and culture? With maybe a little consumer business thrown in? Then you’re in the right place!

I’m Mignon Fogarty: I’ve been writing about language for almost 20 years and was the chair of media entrepreneurship in the School of Journalism at the University of Nevada, Reno. I became interested in AI back in 2022 when articles about large language models started flooding my Google alerts. AI Sidequest is where I write about stories I find interesting. I hope you find them interesting too.

Written by a human