Notebookcheck Logo

European Open Web Index pilot grants access to almost 1 petabyte of crawled data

Europe’s Open Web Index enters pilot phase in June (Image source: Dall-E 3)
Europe’s Open Web Index enters pilot phase in June (Image source: Dall-E 3)
Next month, the OpenWebSearch.eu consortium opens its federated Open Web Index pilot, granting researchers and developers access to almost one petabyte of European web data.

The OpenWebSearch.eu consortium will open the first federated, pan-European Open Web Index (OWI) to external testers next month. The pilot grants access to almost one petabyte of crawled web data and marks the initial step toward a long-term index designed to reach 5 PB and ultimately 10 PB of content.

Unlike a conventional search engine, the OWI functions as a shared digital library that third-party services—search portals, large-language-model providers, or research teams—can query to retrieve documents. A 14-member partnership of universities, super-computing centers, technology firms, and CERN is funding the infrastructure in an effort to reduce European dependence on proprietary indexes maintained by Google, Microsoft, and other US-based operators.

Backers argue that centralization around ad-driven platforms has weakened search quality and limited linguistic coverage. By running a non-profit, standards-driven index inside the European regulatory space, the consortium hopes to encourage services that respect local data-protection rules, surface results in multiple languages, and steer clear of aggressive advertising or self-preferencing. Regulators in Brussels and London have repeatedly challenged the dominance of US tech companies on exactly these grounds.

During the pilot, academic groups, start-ups, and individual developers can obtain the dataset under a general research license or apply for a commercial license. Community manager Ursula Gmelch describes the launch as “a first step towards true European digital sovereignty,” adding that early feedback will determine how the index evolves to match real-world demand. The team is especially interested in vertical and argumentative search, retrieval-augmented generation, and related AI applications.

The timetable aligns with InvestAI, the European Commission program that aims to mobilize €200 billion (roughly $224.7 billion) for artificial-intelligence projects. An open Zoom session scheduled for 10 a.m.–noon CEST on 6 June will introduce participants to the platform and distribute credentials. If successful, the trial could give small and mid-sized European companies the raw material needed to build competitive search and AI tools independent of the prevailing US ecosystems.

Source(s)

OpenWebSearch (in English)

Read all 1 comments / answer
static version load dynamic
Loading Comments
Comment on this article
Please share our article, every link counts!
Mail Logo
> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2025 05 > European Open Web Index pilot grants access to almost 1 petabyte of crawled data
Nathan Ali, 2025-05-19 (Update: 2025-05-19)