1 min read

How to Block OpenAI's New AI Training Web Crawler

How to Block OpenAI's New AI Training Web Crawler

OpenAI, the creator of ChatGPT and other AI systems, has introduced a new web crawler named GPTBot, which is used to train AI models like GPT-5. While web crawlers are valuable for AI training by scanning websites and gathering data, there is a way to block their access.

OpenAI's Web Crawlers

ChatGPT, a highly capable AI system, is being continually enhanced, but concerns about its intelligence have emerged. OpenAI employs web crawlers to assist in training its large language models (LLMs), such as GPT-3.5 and GPT-4.

Web crawlers are instrumental for indexing website content, used by both search engines and AI developers. These crawlers expedite the process of training LLMs with extensive data.

OpenAI's GPTBot Access

OpenAI's GPTBot, the new web crawler, can be allowed access to sites to enhance AI models' accuracy and safety. OpenAI has safeguards in place to exclude paywall-restricted, personally identifiable, or policy-violating content.

How to Block GPTBOT

For developers who wish to prevent GPTBot from accessing their sites, OpenAI provides instructions.

GPTBot's access can be tailored by permitting it to crawl specific sections of a site while blocking others.

This can be done by adding GPTBot to the site's robots.txt and adjusting access permissions accordingly.

Full stop = "Disallow: /". 

Partial stop = "Allow: /directory-1/" and "Disallow: /directory-2/" (etc.)

Is OpenAI Coming for Your Data?

Though it's unclear whether GPTBot contributed to training existing LLMs like GPT-3.5 and GPT-4, it might be utilized for training GPT-5 in the future. OpenAI has sought to trademark the name "GPT-5," hinting at its potential development. While GPT-5's release date remains undisclosed, it is expected to surpass GPT-4 in size and capability.

OpenAI has faced legal challenges regarding data usage, with accusations of data infringement. Websites like Stack Overflow, Reddit, and Twitter have considered charging AI entities for data access in response.

Introducing Voice and Image Features for ChatGPT

2 min read

Introducing Voice and Image Features for ChatGPT

OpenAI is enhancing ChatGPT with brand-new capabilities, namely voice and image support. These additions will provide users with more dynamic...

Read More
Possible New York Times Lawsuit Against OpenAI

Possible New York Times Lawsuit Against OpenAI

In a recent development, The New York Times (NYT) has updated its terms of service to prevent AI companies from using its content to train AI models....

Read More
6 Tips: Chatbots for Customer Service

6 Tips: Chatbots for Customer Service

Chatbots are becoming indispensable sales tools for engaging customers and driving revenue. Here are six ways to effectively deploy conversational AI...

Read More