AI Search: How LLMs Work (Part 1 of 3)

Picture of Kathryn Hillis
Kathryn Hillis

Director of Organic Search Strategy

Large language models (LLMs) have driven enormous investment, stock market gains, and countless headlines speculating about the transformational potential of AI. However, LLMs today are sometimes unreliable. Many companies haven’t seen measurable returns from LLM implementations, in part due to unpredictable and inconsistent outputs.

The sweeping presence and potential shortcomings of LLMs are evident at scale in AI search. The most seismic shift in search towards widespread AI exposure has been Google’s AI Overviews. This feature reportedly exposes billions of users to AI-generated summaries directly in their search results.

AI chatbots like ChatGPT, Gemini, and Claude have also become more mainstream ways people look for information. Chatbots do not seem to be replacing traditional search, but they are reshaping how people discover and evaluate information.

This is the first article in a three-part series about AI search. In this series, the term “AI search” applies to information-seeking experiences powered by LLMs where users get AI-generated responses rather than just a list of links. That includes search features like Google AI Overviews as well as AI chatbots like ChatGPT.

The series will cover:

AI Search Experiences & Chatbots

AI search is both new and part of the decades-long pursuit to build useful search experiences.

The traditional search goal of delivering the right blue links for a query is complex in its own right. Search engines first must correctly interpret what people mean (including the 15% of queries never seen before) and quickly offer relevant, trustworthy links.

Google has been using AI to tackle the challenges of traditional search for years. Hummingbird was an important early milestone in 2013. Several years later in 2019, BERT was a major step forward towards helping Google better interpret the intent behind complex queries.

AI search powered by LLMs must contend with the challenges of traditional information retrieval as well as the distinct challenge of generating fluent, relevant responses. 

Today, AI search appears appears in two main forms: AI search experiences and chatbots. We’ll discuss what they are and how they compare below.

AI Search Experiences

AI-powered search experiences, referred to as AI search experiences in this series, are typically designed to retrieve web sources, synthesize findings, and provide links.

The most widespread example is Google AI Overviews. This feature uses Gemini models to generate summaries directly in Google search, grounded in web results with supporting links.

Another example is Google’s AI Mode, which is a conversational interface inside search and marketed as an expansion of AI Overviews. Google has started pushing AI Mode quite heavily, merging it with AI Overviews in December 2025 and integrating it into the mobile experience in January 2026.

Products marketed as AI answer engines, like Perplexity, are generally AI search experiences too.

Google AI Overviews
Google AI Mode

User Behavior When AI Responses Are Present

Early research suggests that people are less likely to click on links when AI responses are present.

Data from Pew Research shows users clicking on links less often when an AI Overview appears. Click-through-rates can vary based on query, but the general decline in clicks is consistent across multiple studies on this topic.

Similarly, this AI Mode user behavior study describes how people rarely click out. Users typically stay in the interface until they’re ready to buy.

AI Chatbots for General-Purpose Use

AI chatbots like ChatGPT, Gemini, and Claude are designed to help with a wide range of tasks, including getting information. OpenAI’s consumer usage analysis classifies “Seeking Information” as one of the three most common conversational topics. 

Chatbot adoption has grown quickly. ChatGPT was still the market leader in late 2025 based on monthly active users, although its growth started to slow while Gemini adoption picked up

AI chatbots often generate responses based on training knowledge. Market-leading chatbots can also pull in external sources as needed, including the web.

User Behavior When Using AI Chatbots

Early user research, like this 2025 studysuggests that people using chatbots tend to treat the interface as a one-stop answer, rather than a starting point to explore other sites or verify primary sources.

Staying in the interface can be efficient when the chatbot is right. That is a very possible outcome for practical tasks with clear criteria. But when AI chatbots are wrong, users might not verify claims. 

Next, let’s look at the underlying mechanics of these systems.

How Do LLMs Work?

An LLM is a system trained on very large quantities of information to recognize patterns in language and predict the next likely word. They don’t work like a database returning stored fact. They’re probabilistic.

Mark Riedl, a professor who studies human-centered AI, put it simply in his Very Gentle Introduction to LLMs article: LLMs do not have “core beliefs.” They are “word guessers.” 

Pulling from that same article again, our first instinct when interacting with a LLM should be: “I’ve probably asked it to do something that it has seen bits and pieces of before.” We’ll discuss those bits and pieces next.

Training Data

LLMs learn patterns from vast quantities of information. This includes books, web pages, and other sources. They’re trained primarily on text, although leading models can handle images, audio, and other media.

Before training begins, raw data is cleaned to remove duplication and noise. Text is broken into small pieces (tokens) and converted into numbers. Those numeric representations allow the model to learn patterns of language and relationships between terms. With enough training, the model can often produce coherent responses relevant to prompts.

Source: https://towardsdatascience.com/the-art-of-tokenization-breaking-down-text-for-ai-43c7bccaed25/

Training generally provides the foundation for reasoning and generation in LLMs. Training data is usually the default knowledge layer for general-purpose AI chatbots, although web search may be used for select prompts, like those requiring up-to-date information or citations.

Training is powerful but also very expensive. Annual costs continue to rise. The largest training runs are projected by Epoch AI to cost more than 1 billion dollars in 2027.

Partly due to cost, significant updates to training data are staggered, so knowledge cutoffs exist. This can result in outdated information. To help fill gaps, all major LLMs pull from external sources when needed, including from the web. We will discuss that process next.

RAG & Web Search

Retrieval-augmented generation, or RAG, is a process that enables LLMs to pull from external sources, including live web results. RAG can reduce hallucinations and improve user trust. This article will focus on web search in LLMs, a type of RAG pipeline.

In AI search experiences, like Google AI Overviews, responses are generally grounded in search results with supporting links. AI chatbots like ChatGPT work a bit differently, typically starting from training data and may pull external sources as needed, such as when a prompt requires up-to-date information, fact-checking, or citations.

For marketers, a practical reason to understand web search in an AI context is that you can often affect this retrieval later sooner than training data, which has cutoff points as mentioned before.

The visual below, created by the author of this article, illustrates a plausible pipeline for web search in AI chatbots. The same general flow likely applies to AI search experiences too.

Source: Ida Silfverskiöld, https://medium.com/data-science-collective/how-web-search-works-in-ai-chats-727ff4328980

At a high level, the web search process works as follows:

A user provides a query. It is rewritten and might be expanded. Traditional search engines help find an initial set of URLs. In this way, strong rankings in traditional search can improve a page’s chance of being selected in this discovery stage.

AI crawlers then fetch that small set of selected URLs. Most major AI crawlers do not seem to render JavaScript, or at least not reliably. This is different from Googlebot, which can process JavaScript quite well now. 

Retrieved pages are broken into pieces, chunks, and probably cached for reuse if they’re not already.

There are different chunking strategies, and chunk size varies. Typically, a hybrid approach is used. Below is an example of one chunking strategy.

Source: https://blog.dailydoseofds.com/p/5-chunking-strategies-for-rag

Chunks are re-ranked and selected. The system is designed to prioritize the chunks that best match the intent of that specific prompt in that moment. The selected chunks are passed to the model, which uses them to generate an answer.

Google Search Results & AI Tools

It can be hard to pin down exact web sourcing during AI search. Google’s products use Google’s index, but sources of other products like ChatGPT can be a bit murky in places. Officially, ChatGPT uses Bing. However, there has been reporting that OpenAI used Google Search in some capacity, likely through a third-party tool like SerpApi. It’s unclear if they still do now.

There are also legal factors that might eventually affect the specifics of web search and AI tools. In late 2025, Google and Reddit filed lawsuits separately against SerpApi, Perplexity, and others over what they claim is unauthorized scraping.

Post-Training

Optimizations continue after training, including Reinforcement Learning from Human Feedback (RLHF). We won’t cover that in this article for scope, but this article on the topic is worth reading for awareness.

Conclusion

We covered what AI search is, how it fits into the broader search ecosystem, and how LLM-powered search works at a high level. AI Search and LLMs are improving in some areas, but they’re still far from perfect. In the next article, we’ll look at where LLMs can go wrong.