BlockNews
FOLLOW ON X
  • BITCOIN
  • CRYPTO
    • ETHEREUM
    • RIPPLE XRP
    • SOLANA
    • CARDANO
    • BINANCE BNB
    • DOGECOIN
    • TRON
    • SUI
    • CHAINLINK
    • LITECOIN
  • FINANCE
  • POLITICS
  • MEMECOINS
  • NFT
  • OPINION
No Result
View All Result
BlockNews
  • BITCOIN
  • CRYPTO
    • ETHEREUM
    • RIPPLE XRP
    • SOLANA
    • CARDANO
    • BINANCE BNB
    • DOGECOIN
    • TRON
    • SUI
    • CHAINLINK
    • LITECOIN
  • FINANCE
  • POLITICS
  • MEMECOINS
  • NFT
  • OPINION
No Result
View All Result
BlockNews
Home MEDIA

OpenAI Set to Unleash New Web Crawler to Devour More of the Open Web

BlockNews Team by BlockNews Team
August 10, 2023
in MEDIA, SOCIAL, TECHNOLOGY
Share on XShare in TelegramShare on Reddit
  • OpenAI has introduced GPTBot, a web crawling bot, to gather data for training its upcoming AI systems, possibly named “GPT-5”.
  • GPTBot collects public data from websites similar to search engines, but web publishers can prevent their content inclusion by adding a “disallow” rule.
  • The release of GPTBot raises concerns about consent and copyright, highlighting the ongoing challenges in balancing AI capabilities with ethical considerations.

Leading AI firm OpenAI has released a new web crawling bot, GPTBot, to expand its dataset for training its next generation of AI systems—and the next iteration appears to have an official name. The company trademarked the term “GPT-5,” implying an upcoming release while informing web publishers how to keep their content out of its massive corpus.

According to OpenAI, the web crawler will collect publicly available data from websites while avoiding paywalls, sensitive and prohibited content. However, unlike other search engines such as Google, Bing, and Yandex, the system is opt-out—by default, GPTBot will assume all accessible information is fair game.

To prevent OpenAI’s web crawler from ingesting a website, the website’s owner must add a “disallow” rule to a standard file on the server.

GPTBot, according to OpenAI, will also scan scraped data ahead of time to remove personally identifiable information (PII) and text that violates its policies.

However, some technology ethicists believe the opt-out approach still raises consent challenges.

 Some users justified OpenAI’s move on Hacker News by stating that if people want a capable generative AI tool in the future, they must gather as much information as possible. “They still need current data, or their GPT models will be stuck in September 2021 forever,” said one user. Another privacy-concerned user claimed that “OpenAI isn’t even citing in moderation. It’s making a derivative work without citing, thus obscuring it.”

GPTBot’s release follows recent criticism of OpenAI for previously scraping data without permission to train Large Language Models (LLMs) such as ChatGPT. The company updated its privacy policies in April in response to such concerns.

Meanwhile, the recent trademark application for GPT-5 appears to confirm that OpenAI is developing its next model in preparation for a future launch. The new system will likely use large-scale web scraping to update and broaden its training data.

This could indicate a shift from OpenAI’s early emphasis on transparency and AI safety. Still, it’s not surprising, given that ChatGPT is the most widely used LLM in the world, despite an increasingly crowded and powerful marketplace.

OpenAI’s star product—and that of any LLM—is only as good as the quality of the data used to train it. OpenAI requires more and newer data, and a lot of it.

ChatGPT now has over 1.5 billion active monthly users. And Microsoft’s $10 billion investment in OpenAI appears to have been foresighted, as ChatGPT integration has enhanced Bing’s capabilities.

For the time being, OpenAI leads the hot AI space, with tech titans racing to catch up. The company’s new web crawler could improve the capabilities of its models. However, expanding internet data collection raises ethical concerns about copyright and consent.

Balancing transparency, ethics, and capabilities will remain complex as AI systems become more sophisticated.

Disclaimer: BlockNews provides independent reporting on crypto, blockchain, and digital finance. All content is for informational purposes only and does not constitute financial advice. Readers should do their own research before making investment decisions. Some articles may use AI tools to assist in drafting, but every piece is reviewed and edited by our editorial team of experienced crypto writers and analysts before publication.
Tags: ChatGPTOpenAI
TweetShareShare
BlockNews Team

BlockNews Team

DON'T MISS THESE! HOT OFF THE PRESS

Clarity Act Crypto Rally May Be a Trap – Here Is Why Traders Are Cautious
BITCOIN

Clarity Act Crypto Rally May Be a Trap – Here Is Why Traders Are Cautious

March 6, 2026
Hedera Integrates Chainlink Oracle Services – Here Is What It Unlocks for Crypto Developers
CHAINLINK

Hedera Integrates Chainlink Oracle Services – Here Is What It Unlocks for Crypto Developers

March 5, 2026
Bittensor TAO Rides AI Wave After NVIDIA Earnings – Here Is Why Supply Shock Talk Is Growing
CRYPTO

Bittensor TAO Rides AI Wave After NVIDIA Earnings – Here Is Why Supply Shock Talk Is Growing

March 1, 2026
Ethereum Crypto Glamsterdam Update Boosts Scalability – Here Is the Breakdown
CRYPTO

Ethereum Crypto Glamsterdam Update Boosts Scalability – Here Is the Breakdown

February 28, 2026
Anatoly Yakovenko Says Solana Rivals Bitcoin in Decentralization – Here Is Why
BITCOIN

Anatoly Yakovenko Says Solana Rivals Bitcoin in Decentralization – Here Is Why

February 27, 2026
Trump Bans Anthropic AI From Federal Use – Here Is What It Means
FINANCE

Trump Bans Anthropic AI From Federal Use – Here Is What It Means

February 27, 2026
Load More

Related News

Bitcoin Defies Market Panic – Here Is Why BTC Is Rising as VIX Surges

Bitcoin Defies Market Panic – Here Is Why BTC Is Rising as VIX Surges

March 9, 2026
Pi Coin Crypto Surges Despite Market Drop – Here Is Why PI Is Rallying

Pi Coin Crypto Surges Despite Market Drop – Here Is Why PI Is Rallying

March 9, 2026
Trump Weighs Oil Price Controls Amid Iran War – Here Is Why Crypto Markets Care

Trump Weighs Oil Price Controls Amid Iran War – Here Is Why Crypto Markets Care

March 9, 2026
Trump Says Iran War Is “Pretty Much Complete” – Here Is Why Crypto Markets Are Watching

Trump Says Iran War Is “Pretty Much Complete” – Here Is Why Crypto Markets Are Watching

March 9, 2026
Pudgy Penguins Launch Pudgy World Browser Game Letting Millions Explore The Berg Without Downloads

Pudgy Penguins Launch Pudgy World Browser Game Letting Millions Explore The Berg Without Downloads

March 9, 2026
Twitter Telegram Threads

BLOCKNEWS.COM

BlockNews is your premier source for real-time cryptocurrency, blockchain, political and financial market news.

Stay ahead of the herd with BlockNews

RESOURCES

  • About Us
  • Contact Us
  • Editorial Policies
  • Terms and Conditions
  • Privacy Policy
  • Sitemap

DISCLOSURES AND POLICIES

BlockNews provides independent reporting on crypto, blockchain, and digital finance. Content is for informational purposes only and does not constitute financial advice. Sponsored material is always disclosed. By using this site, you agree to our Terms and Conditions and Privacy Policy.

© 2025 BlockNews

Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
View preferences
  • {title}
  • {title}
  • {title}
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
  • Manage options
  • Manage services
  • Manage {vendor_count} vendors
  • Read more about these purposes
View preferences
  • {title}
  • {title}
  • {title}
No Result
View All Result
  • HOME
  • BITCOIN
  • CRYPTO
    • ETHEREUM
    • RIPPLE XRP
    • SOLANA
    • CARDANO
    • BINANCE BNB
    • DOGECOIN
    • TRON
    • LITECOIN
    • CHAINLINK
    • SUI
  • MEMECOINS
  • POLITICS
  • FINANCE
  • NFT
  • DEFI
  • GUIDES

© 2025 BlockNews