Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

WAIS and search as a peer service

WAIS is the system in this book that has been most thoroughly forgotten. Most people working on internet infrastructure in 2026 have never heard of it. Most computer-science curricula do not mention it. Most discussions of the history of the web pass over it in a sentence or skip it entirely. This is unjust, because WAIS got something fundamentally right that nothing on the modern web gets right: it treated search as a peer service that anyone could provide, with results returned by federation across many independent indexes. The single most important property of search in the world WAIS proposed — that the user, not a platform, decides which indexes to query and what to do with their results — has been almost entirely lost. The world we live in, where the verb “to search” effectively means “to ask Google,” is the world that displaced WAIS. The system that did the displacing was the same system that displaced the rest of the alternatives in this book: the web, with its emergent property that search would be provided by whoever was best at it, and that consolidation around the best provider was the natural endpoint. This chapter is about what WAIS was, what it proposed, why the proposal lost, and what has happened to federated search since.

The system

WAIS — Wide Area Information Servers, with the plural meaning what it says — was released in 1991 as a joint project of Thinking Machines Corporation, Apple Computer, Dow Jones, and KPMG. The principal architect was Brewster Kahle, then at Thinking Machines, where he had been working since the company’s founding by Danny Hillis in 1983. Thinking Machines built massively parallel supercomputers called Connection Machines; the CM-2 had 65,536 processors and the CM-5 had up to several thousand SPARC nodes. Kahle’s interest in information retrieval, originally an academic interest from his time at MIT, had become a Thinking Machines product line: the Connection Machine could index large text corpora at speeds that contemporary single-processor machines could not match, and Thinking Machines was selling supercomputer-backed search services to corporate and government customers.

WAIS was a generalization of that work to the wider internet. Kahle and his colleagues proposed that the search functionality the Connection Machine offered could be distributed: any host on the internet could run a WAIS server, indexing whatever content was on that host, and any WAIS client could query any combination of WAIS servers across the internet. The protocol that connected them was based on Z39.50, a NISO standard for library information retrieval that had been formalized in 1988. Z39.50 was a heavyweight protocol — it was designed for the federation of bibliographic systems across libraries, with rich query semantics, structured records, and elaborate session handling — but the WAIS profile of Z39.50 simplified it considerably and made it suitable for the kinds of full-text retrieval the wider internet wanted.

A WAIS server exposed a set of databases. Each database was a collection of indexed documents on some topic. The server’s job was to accept queries, run them against its databases, score the matches by relevance, and return a ranked list of results. The client’s job was to send queries, display the results to the user, retrieve the full documents on request, and — this is the federation step — distribute the same query across multiple servers when the user asked it to. The user could ask the client to query a single server, or several servers in parallel, or every server the client knew about. The client merged the results and presented them as a unified list.

There was a “directory of servers” — a kind of meta-WAIS server that indexed information about other WAIS servers. A user who wanted to find out what databases were available on a topic could query the directory and get a list of WAIS servers that had relevant content. The directory itself was a WAIS database, navigable through the same client interface as everything else. Kahle’s design was consistent: WAIS was the substrate, and everything in the system — the data, the indexes, the directory of servers — was a WAIS database, queryable the same way.

The relevance ranking

What made WAIS more than a federated database query system was its relevance ranking. The Connection Machine’s parallelism allowed Thinking Machines to implement sophisticated scoring algorithms for matching queries against documents. The specifics of the scoring evolved through the protocol’s life, but the basic approach used term frequency, document length normalization, and various other factors familiar from the information retrieval research of the 1970s and 1980s. The result was that WAIS searches were ranked: when you got a list of results, the top result was the one the server’s algorithm thought was most relevant, the second was second-most, and so on. This was, in 1991, a real novelty for internet users. The dominant search tool of the time, Archie for FTP, did substring matching against file names. Veronica, the Gopher search, was little richer. WAIS was doing actual relevance ranking against full-text indexes, distributed across federated servers, with results presented in a unified ranking. It was, in capability if not in interface, the first internet-scale search system that resembles what users now expect.

The ranking was also relevance-fed-back: a user could mark certain documents in the result set as relevant and ask the server for “more like these,” and the server would generate a new query incorporating features of the marked documents. This was relevance feedback in the standard information-retrieval sense, deployed in a production system in 1991. The web’s search engines would not implement comparable feedback features for users until much later, and even then mostly in research interfaces rather than mainstream products.

What got indexed

WAIS servers proliferated through 1991 and 1992. By late 1992 there were several hundred public WAIS servers indexing topics that ranged from academic computer-science papers to bird-watching field reports to recipes to legal documents. The Library of Congress ran a WAIS server. The Environmental Protection Agency ran one. CERN itself ran a WAIS server for some of its physics literature. Apple ran one for technical documentation. The Internet Engineering Task Force ran one for the RFCs. Universities indexed their local collections; government agencies indexed their reports; corporations indexed their product documentation; hobbyists indexed their personal archives.

The federation worked. A user who wanted to find information on a topic could query the directory of servers, identify several WAIS servers with relevant content, ask their client to query all of them, and get back a single ranked list. The list would contain results from each of the queried servers, with the server source visible to the user. The user could see, for example, that the top three results came from the EPA’s environmental database, the next two from a university research library, and the next several from a government scientific archive. The federated structure was not hidden; it was a feature of the user experience.

This was a different relationship between user and search than the web’s. The web’s search engines, even in their earliest forms, presented results as if from a single index. The user did not see, except perhaps as small annotations on result lines, which servers the results came from. The query went to one place, and one place answered it. WAIS’s federation made visible what the web’s centralization concealed: that the corpus being searched was a federation of independent collections, that each collection had its own provenance and its own authority, and that the user was making implicit judgments about which sources to trust as part of the search activity.

The institutional context

WAIS came out of the Thinking Machines world. Thinking Machines was, in the late 1980s, one of the most ambitious computer companies in the United States. Its Connection Machines were used for scientific computing, intelligence work, and a small but real commercial search business. The company had assembled some of the most capable computer scientists of the period — Marvin Minsky was on the board, Richard Feynman had been a consultant before his death, the working scientists included Danny Hillis, Brewster Kahle, Stephen Wolfram for a period, David Waltz, and many others. WAIS was, in the Thinking Machines context, a product. The company was selling Connection Machine-based search services and WAIS was a way to extend the brand into a smaller-scale market while also serving as a substrate for selling more Connection Machines to organizations that wanted to host their own searches.

In 1992, Kahle spun off WAIS as an independent company, WAIS Inc., to develop the protocol and the associated client and server software commercially. The free reference implementation, FreeWAIS, was released and maintained by a community at the Center for Networked Information Discovery and Retrieval at Clearinghouse for Networked Information Discovery and Retrieval (CNIDR) at North Carolina State University, with the free version becoming the dominant WAIS server for most non-commercial deployments. WAIS Inc. continued to sell premium services and software to commercial customers. The two-track approach — free reference implementation for the broader community, commercial implementation for enterprise — was reasonable, and worked for several years.

The web’s arrival began to compete with WAIS along several fronts simultaneously. Web-based search engines started appearing in 1993 and 1994 — Wandex, Aliweb, WebCrawler, Lycos. The web search engines did not federate across WAIS servers; they crawled web pages and built their own central indexes. The user experience was simpler — type a query into a web form, get back a list of web links — and the corpus being searched was the web, which was growing fast and was where new content was increasingly being published. WAIS’s strengths — federation, structured data, the ability to search across heterogeneous collections — were less compelling for users who mostly wanted to find web pages, and the web search engines, despite being technically less sophisticated than WAIS in some respects, became the dominant search experience by the mid-1990s.

WAIS Inc. was acquired by America Online in May 1995, three months before the Netscape IPO. AOL’s interest in WAIS was as a search infrastructure for its own walled-garden services, not as a substrate for the federated open internet. The WAIS technology was integrated into AOL’s offerings and disappeared from the public-facing internet within a few years. The FreeWAIS community continued to maintain the open-source server for several more years before that, too, wound down. By the late 1990s, WAIS was effectively gone from public use, although Z39.50 itself continued to be used in library federation (where it remains, in the present day, the substrate for federated catalog searches across academic and public libraries).

What the web’s search did not preserve

The web’s search architecture differed from WAIS’s in several specific ways that have become structurally important.

The web’s search is centralized. A search query goes to one of a small number of companies, each of which has built a global crawl of the web and a centralized index. The dominant provider — Google, in 2026 — has a market share in many countries over ninety percent. The user has, in practice, one choice when they search the web: which centralized provider’s index to query. The federation that WAIS made the foundation of its design is, on the web, an option only in narrow domains. There are federated search experiences for academic literature (Google Scholar federates, in a sense, but is itself centralized; the federated alternatives are smaller). There are federated search experiences in library catalogs. There is essentially no federated search experience for the consumer web.

The web’s search is opaque. The user does not see which sources the results came from, in the federation sense WAIS made visible. The user sees web pages, ranked by an algorithm whose specifics are commercial secrets. The user can inspect a search result and click through to its source, but the ranking — which sources the engine considered, how they were weighted, what factors influenced their placement — is invisible. The user’s relationship with the engine is one of trust: trust that the engine is searching what the user wants searched, trust that the ranking reflects relevance and not commercial interests, trust that the engine’s index is complete enough to find what is there. WAIS made the source structure visible and so made it possible for the user to assess these properties for themselves.

The web’s search has become an advertising surface. The dominant provider’s revenue model, for most of its history, has been the placement of paid results inside the organic results. The boundary between paid and unpaid has become, over the years, less and less visible to the user. Many queries now return results in which several of the top items are advertisements that the user has to look carefully to distinguish from the organic results. WAIS had no advertising. The ranking was the algorithm’s judgment of relevance, full stop. The web’s search has become inseparable, structurally, from the advertising market it funds; WAIS’s search was inseparable from the federation it organized.

The web’s search is unaccountable. A site that does not appear in the dominant provider’s results is, for most internet users, invisible. The provider has the power to remove a site from its index for any reason, with no due process and no obligation to explain. The provider has the power to demote a site for any reason, also without process. There have been periodic controversies — the European Union’s Right to be Forgotten rulings, various antitrust actions, individual site owners protesting demotion — but the underlying structure is unchanged: search is provided by a centralized authority, and being visible online is a privilege that authority grants. WAIS distributed this power across many independent servers, each accountable only for its own corpus, and any user dissatisfied with one server’s results could query another.

The federations that remain

Federated search has not disappeared entirely. In the library world, Z39.50 and its modern successor SRU/SRW have persisted; library catalogs federate across institutions through these protocols, and a user querying a federated catalog (such as WorldCat) is using something close to what WAIS proposed. The academic search systems for scientific publications use various federation protocols — OAI-PMH for harvesting metadata, OpenSearch for federated query — that retain WAIS’s basic structure. The federated search interfaces in these domains work and continue to be used.

The fediverse, treated in chapter twenty-two, has been slowly recovering federated search for the social-network domain. The various ActivityPub-based platforms do not yet have a federated search story comparable to what WAIS had — searching the fediverse is, in 2026, still difficult — but several projects (Mastodon’s optional full-text search, the various community-run search aggregators) are working in the right direction. The federated structure of the fediverse provides at least the substrate for federated search; whether it will produce a working system at scale is still an open question.

The decentralized search projects — YaCy (since 2003), various smaller efforts using blockchain and similar substrates — continue to attempt to recover the federated search property. None has reached the scale where it competes with the centralized providers for general use. The cost-and-quality gap between a single large centralized search engine and a federated network of smaller engines remains the main obstacle; building good search at scale is expensive, and federation does not, by itself, solve the expense.

Kahle after WAIS

The afterlife of Brewster Kahle is worth following because it is the clearest example of a designer of one of these systems continuing to do related work after the system itself was absorbed. Kahle founded the Internet Archive in 1996, the year after WAIS Inc. was acquired by AOL. The Internet Archive’s mission — universal access to all knowledge — is the WAIS mission extended to preservation. The Archive’s tools include the Wayback Machine, which preserves snapshots of the web; the Open Library, which provides federated access to library catalogs; archive.org’s collection of digitized books, films, software, and audio; and several federated-access initiatives.

The Wayback Machine is the closest thing the web has to the property WAIS made native: durable access to material. The Wayback Machine retrofits versioning onto the web by preserving snapshots; WAIS’s design assumed that the WAIS servers themselves were durable and that the content was accessible through them indefinitely. The retrofitting works imperfectly — many sites are not crawled, many crawls miss material, the legal status of the Archive’s preservation has been challenged repeatedly — but it has become an essential part of the web’s citation infrastructure, used by millions of researchers daily.

Kahle has been one of the most articulate critics of the web’s centralization, in his role at the Internet Archive and in his advocacy for what he has called the “decentralized web.” His proposal — that web content should be stored in distributed, peer-replicated form, with multiple independent copies and no single point of failure — is in significant part a restatement of what WAIS had been doing. The Decentralized Web Summit conferences Kahle has organized since 2016 have been one of the major venues where the recovery of federated infrastructure is being discussed. The lineage from WAIS to the Internet Archive to the decentralized-web movement is direct; the people are the same; the proposal is the same; the world is just larger and more centralized than it was in 1991, and the recovery is correspondingly harder.

The cost of the loss

What was lost when WAIS lost is not, on the face of it, a feature most users miss. Most users in 2026 are happy with their search engine. The federated-search ideal is abstract; the centralized-search experience is concrete and works well for most of what most users do. The cost of the loss is therefore not visible to most users on most days. It is visible at the margins, where the marginal cases reveal what the structure can and cannot do.

The marginal cases are: politically sensitive searches, where users have legitimate reason to doubt that the centralized provider’s index is complete; specialized academic searches, where the federated alternatives provide better recall on narrow topics than the general-purpose engines; archival searches, where the Wayback Machine and similar services do work the live web cannot; and policy disputes, where the question of who controls what is findable becomes the substance of the issue. In each of these cases, the user is reminded that search is something a small number of companies provide and that the user is, when searching, dependent on those companies’ choices.

WAIS proposed a different relationship. In WAIS’s world, the user chooses which indexes to query, and the user is responsible for assessing the indexes’ coverage and credibility. The work of search is distributed across the user and the various server operators, rather than concentrated in one company. This is more work for the user and is, in many ways, slower and less convenient than what the consolidated search providers offer. It is also more honest about what search is and who is doing it, and it is more robust against the failure modes of consolidation. The trade was not unanimous; the world chose convenience. The chapters in Part V cover several of the smaller-scale recoveries of the federated alternative.

The next chapter takes the third major federated service of the pre-web internet. Usenet was the place online discussion happened, for two decades, on a federated substrate that no single party controlled. Its decline is the most thorough demonstration in the history of internet services of what happens when a federated peer system is displaced by walled-garden alternatives. The decline was not predetermined and the system’s continuing existence — Usenet is still there, in archival and reduced form — is a kind of fossil record of what online conversation can look like when no one is in charge.