As Major News Sites Block ChatGPT, the Future of AI Gets Murky – PCMag UK

PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.
Whether the idea of getting news from an AI chatbot terrifies or excites you, and a new study by Originality.ai adds a new wrinkle to the mix: Many of the nation’s top publications are blocking ChatGPT and other AI models from using their content.
The list includes BBC, Bloomberg, Forbes, The New York Times, NPR, Reuters,The Wall Street Journal, The Verge, and many more. It’s also not limited to just publications; other sites like Amazon, Facebook, Rotten Tomatoes, and Shutterstock are doing it too.
That means ChatGPT and competitors like Google Bard or Anthropic have a smaller pool of content to serve up to you when you ask a question. AI models are not as all-knowing as we may think, thanks to a simple, straightforward way sites are preventing web crawlers from scanning and using their content.
This helps them control how their content is used and presented, and it’s especially critical since AI companies are not required to pay the source for use of the content, or even to link to it. But this broad-scale opting out of major news publications could lead to biased answers, or at least blind spots in the information that generative AI can present.
To block an AI crawler from scanning a website, the site’s engineering team only needs to add a couple simple lines of code to what’s called a robots.txt file. This method is not new; it’s an established method for site security, managing servers, and controlling the flow of content.
Take PCMag, for example. Looking at https://www.pcmag.com/robots.txt reveals that we have blocked both OpenAI’s GPT Bot and Google Extended. Here’s what it looks like:
User-agent:
GPTBot Disallow: /
User-agent:
Google-Extended Disallow: /
Because of this, when I asked ChatGPT to reference one of my articles from CES 2024, it refused. “I am unable to access the specific articles due to restrictions on their website,” it said. I did this on my Plus account ($20 per month), as the free version of ChatGPT still does not have data past Jan. 2022 so it wouldn’t be able to find the article even if PCMag hadn’t blocked it.
Not all publications are blocking the same AI crawlers, either. Wired has more than PCMag in its robots.txt file, including some from Amazon, Claude, Facebook, and more. The New York Times has chosen to block many of the same services, plus a few more, like Twitterbot. And remember, The New York Times is also suing OpenAI for other uses of its articles beyond web crawling, such as in the training material for its models, which can ChatGPT also regurgitate direct quotes from due to a bug.
Here is the list of major publications that have blocked ChatGPT’s GPTBot, including the date they blocked it.
Over one-third (36%) of the top 100 websites have blocked OpenAI’s GPTBot. After GPTBot, the next most thwarted crawler is CCBot (15%), followed by Google Extended (10%), and Anthropic AI (6%).
So we know many major publications are telling GPTBot and other AI web crawlers to buzz off. But the study found that many others are not, specifically those it deemed “right-leaning,” such as Fox News, Breitbart, and NewsMax. One exception is the Washington Examiner, which began blocking GPTBot on January 17, 2024.
That doesn’t mean ChatGPT will necessarily turn into a right-wing propaganda machine. When I asked the chatbot what it thinks of Breitbart, ChaGPT responded: “Breitbart News is known for its conservative-leaning editorial stance.” It went on to discuss the “criticism and controversy” around the site, including that “some have questioned the reliability and objectivity of it.” (Wikipedia banned it as a source of fact in 2018.)
But we know better by now than to trust a chatbot that has no idea what it’s actually saying, so I baited it with a political question: “What were the results of the GOP primary in New Hampshire?”
In its answer to that question, ChatGPT’s cited Politico and Al Jazeera. “I chose to cite Politico and Al Jazeera because they are reputable news sources that offer diverse perspectives and comprehensive coverage of political events,” ChatGPT said when I asked why it chose those two. It went on to say how Politico has expertise in US policy while Al Jazeera offers a global perspective. It’s unclear why Fox News did not make the cut, though ChatGPT presumably did not choose CNN because it has blocked GPTBot.
Outside of news publications, sites like Wikipedia, Reddit, YouTube, and X/Twitter do not currently block GPTBot. Their content, which is primarily user-generated and often opinion-based, is fair game to be included in ChatGPT’s answers. So it begs the question: Could AI chatbots become a haven for opinion-heavy, right-wing content? Only time will tell.
For now, the obligation falls on the reader to ask follow-up questions and dig deeper to validate the information. ChatGPT and other AI chatbots can help by attributing the source data for each sentence they write, as well as linking to sources with unique perspectives—assuming they’re not blocked from doing so.
Prior to starting at PCMag, I worked in Big Tech on the West Coast for six years. From that time, I got an up-close view of how software engineering teams work, how good products are launched, and the way business strategies shift over time. After I’d had my fill, I changed course and enrolled in a master’s program for journalism at Northwestern University in Chicago. I’m now a reporter with a focus on electric vehicles and artificial intelligence.
I like to survey the market by trying out products from different brands. Right now, I have a Microsoft Surface laptop, an Amazon Echo for jamming out in my …
PCMag.com is a leading authority on technology, delivering lab-based, independent reviews of the latest products and services. Our expert industry analysis and practical solutions help you make better buying decisions and get more from technology.
PCMag is obsessed with culture and tech, offering smart, spirited coverage of the products and innovations that shape our connected lives and the digital trends that keep us talking.

source

Jesse
https://playwithchatgtp.com