4 AI Models Define Good Search Engines

Every day, users ask search engines millions of questions. The information we receive can shape our opinions and behaviour.

Authors

  • Simon Coghlan

    Senior Lecturer in Digital Ethics, Centre for AI and Digital Ethics, School of Computing and Information Systems, The University of Melbourne

  • Damiano Spina

    Senior Lecturer, School of Computing Technologies, RMIT University

  • Falk Scholer

    Professor of Information Access and Retrieval, RMIT University

  • Hui Chia

    PhD Candidate in Law, The University of Melbourne

We are often not aware of their influence, but internet search tools sort and rank web content when responding to our queries. This can certainly help us learn more things. But search tools can also return low-quality information and even misinformation.

Recently, large language models (LLMs) have entered the search scene. While LLMs are not search engines , commercial web search engines have started to include LLM-based artificial intelligence (AI) features into their products. Microsoft's Copilot and Google's Overviews are examples of this trend.

AI-enhanced search is marketed as convenient. But, together with other changes in the nature of search over the last decades, it raises the question: what is a good search engine?

Our new paper, published in AI and Ethics , explores this. To make the possibilities clearer, we imagine four search tool models: Customer Servant, Librarian, Journalist and Teacher. These models reflect design elements in search tools and are loosely based on matching human roles.

The four models of search tools

Customer Servant

Workers in customer service give people the things they request. If someone asks for a "burger and fries", they don't query whether the request is good for the person, or whether they might really be after something else.

The search model we call Customer Servant is somewhat like the first computer-aided information retrieval systems introduced in the 1950s. These returned sets of unranked documents matching a Boolean query - using simple logical rules to define relationships between keywords (e.g. "cats NOT dogs").

Librarian

As the name suggests, this model somewhat resembles human librarians. Librarian also provides content that people request, but it doesn't always take queries at face value.

Instead, it aims for "relevance" by inferring user intentions from contextual information such as location, time or the history of user interactions. Classic web search engines of the late 1990s and early 2000s that rank results and provide a list of resources - think early Google - sit in this category.

Journalist

Journalists go beyond librarians. While often responding to what people want to know, journalists carefully curate that information, at times weeding out falsehoods and canvassing various public viewpoints.

Journalists aim to make people better informed. The Journalist search model does something similar. It may customise the presentation of results by providing additional information, or by diversifying search results to give a more balanced list of viewpoints or perspectives .

Teacher

Human teachers, like journalists, aim at giving accurate information. However, they may exercise even more control: teachers may strenuously debunk erroneous information, while pointing learners to the very best expert sources, including lesser-known ones. They may even refuse to expand on claims they deem false or superficial.

LLM-based conversational search systems such as Copilot or Gemini may play a roughly similar role. By providing a synthesised response to a prompt, they exercise more control over presented information than classic web search engines.

They may also try to explicitly discredit problematic views on topics such as health, politics, the environment or history. They might reply with "I can't promote misinformation" or "This topic requires nuance". Some LLMs convey a strong "opinion" on what is genuine knowledge and what is unedifying.

No search model is best

We argue each search tool model has strengths and drawbacks.

The Customer Servant is highly explainable: every result can be directly tied to keywords in your query. But this precision also limits the system, as it can't grasp broader or deeper information needs beyond the exact terms used.

The Librarian model uses additional signals like data about clicks to return content more aligned with what users are really looking for. The catch is these systems may introduce bias. Even with the best intentions, choices about relevance and data sources can reflect underlying value judgements.

The Journalist model shifts the focus toward helping users understand topics, from science to world events, more fully. It aims to present factual information and various perspectives in balanced ways.

This approach is especially useful in moments of crisis - like a global pandemic - where countering misinformation is critical. But there's a trade-off: tweaking search results for social good raises concerns about user autonomy. It may feel paternalistic, and could open the door to broader content interventions.

The Teacher model is even more interventionist. It guides users towards what it "judges" to be good information, while criticising or discouraging access to content it deems harmful or false. This can promote learning and critical thinking.

But filtering or downranking content can also limit choice, and raises red flags if the "teacher" - whether algorithm or AI - is biased or simply wrong. Current language models often have built-in "guardrails" to align with human values, but these are imperfect. LLMs can also hallucinate plausible-sounding nonsense, or avoid offering perspectives we might actually want to hear.

Staying vigilant is key

We might prefer different models for different purposes. For example, since teacher-like LLMs synthesise and analyse vast amounts of web material, we may sometimes want their more opinionated perspective on a topic, such as on good books, world events or nutrition.

Yet sometimes we may wish to explore specific and verifiable sources about a topic for ourselves. We may also prefer search tools to downrank some content - conspiracy theories, for example.

LLMs make mistakes and can mislead with confidence . As these models become more central to search, we need to stay aware of their drawbacks, and demand transparency and accountability from tech companies on how information is delivered.

Striking the right balance with search engine design and selection is no easy task. Too much control risks eroding individual choice and autonomy, while too little could leave harms unchecked.

Our four ethical models offer a starting point for robust discussion. Further interdisciplinary research is crucial to define when and how search engines can be used ethically and responsibly.

The Conversation

Damiano Spina has received funding from the Australian Research Council and is an Associate Investigator of the ARC Centre of Excellence for Automated Decision-Making and Society (ADM+S).

Falk Scholer has received funding from the Australian Research Council and is an Associate Investigator of the ARC Centre of Excellence for Automated Decision-Making and Society (ADM+S).

Hui Chia and Simon Coghlan do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.

/Courtesy of The Conversation. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).