How curious are agents/chatbots?

People often mistake Agentic AI as a magical power that can achieve wonders. But can it really?

May 27, 2026

How far can an agent search for the information it needs? How far will it go? Which tools utilise agentic browsing vs more basic page parsing and summarisation.

Let’s take a look!

TL;DR

Most common AI chatbot referral traffic is unlikely to involve a full browser render of your content by default.

Most agents do not explore extensively unless given a task that requires it. Even when asked to “inspect carefully”, they usually summarise what looks relevant, then stop.

For SEOs, the takeaway is:

don’t assume AI agents will crawl, click, inspect, or infer everything they technically can access.

If something matters, make it visible, well-linked, and contextually obvious.

Summary of the initial test

The initial test used three prompts against the same page:

Basic summary prompt: “What is this page about?”
Careful inspection prompt: “Please inspect this page carefully and tell me everything important you can find.”
Browser-explicit prompt: “Please inspect this page carefully and tell me everything important you can find use a browser.”

On the surface, it was a Riverside Community Garden website, very “normal” looking. A five-page site hosted on GitHub Pages and hard-coded.

The test was whether agents would go beyond the visible article and inspect things like:

hidden/off-screen content
technical structure
internal links
orphaned pages, only accessible through links a rendered/browser view would expose
metadata
page behaviour
anything a normal user may not notice

The broad finding: most agents summarised the visible page well, but did not naturally investigate beyond it. They were useful, but not all that curious.

Initial test findings

Most tools produced a competent summary of the visible page, with the main difference being the depth of inspection.

The clearest deeper inspection from the first set was Perplexity Computer’s third attempt, which found the off-screen (hidden) note and orphaned page link.

Most other tools stayed focused on the main article content.

Curiosity Level

If we imagine a ladder, where the further you climb represents how curious the chatbot/agent was, we can see how this played out.

The important distinction. Most agents reached levels 1–2. Very few reached levels 3–5 unless the tool/mode or prompt made that behaviour more likely.

Very few made the leap to investigate the hidden page, and even then there was no real consistency.

Prompt effect

The prompt itself clearly had an impact on what the chatbot/agent returned, which is totally intuitive. The wording helped, but did not guarantee agentic behaviour.

The key issue could be that “important” was usually interpreted as important visible content, not important page artefacts. The agents may have done some kind of interpretation here to mitigate prompt injection.

The original 3rd (deep) prompt may still have been too broad, so I revised it to see if we pushed it deeper:

“Please inspect this page carefully and check anything you think is relevant to understanding the concepts related to it…”

Remembering that is is important that we don’t overly prejudice the agent in searching for something intentionally hidden.

Comparison of deeper agent runs

With this revised prompt, I re-ran this test three times in the more willing/able to use agentic browsing.

ChatGPT Agent - Best at producing a useful answer, but inconsistent on whether it inspected deeper or expanded conceptually.

Manus - Produced something closer to a “research report” than forensic inspection. Good at expanding the topic, weaker at spotting the test mechanics. It was also prone to hallucination or adding in additional context where there was none.

Claude - Most suspicious/defensive. Best at identifying the page as an AI-behaviour test, but less useful as a natural page-inspection proxy.

So the agent behaviour was not fixed. It varied by run, mode, and likely internal tool decisions.

What does this mean for agent curiosity?

My working assumption, so far, is:

Agents are economical, not fundamentally curious.

For a page that looks like a normal article, most agents treat it like a normal article. They summarise what is visible and move on.

Why might we have seen this?

There are a few likely reasons.

The task looked like summarisation - The page looked like a normal volunteer update. So most agents answered it as a content-summary task.
“Carefully” increased detail, not curiosity - The models often responded by writing more, not by inspecting differently.
Tool access varied - Some systems had no reliable page access. Some had browser access. Some had agent/computer modes. These are not equivalent.
Hidden content was not always treated as relevant - Even if content exists in the DOM, the model may not surface it unless it believes hidden or technical content matters to the task.
Context shapes curiosity - A community garden page may not naturally trigger forensic inspection. A more complex task run on a more complex site may have.

What the agent appears to be looking for is a kind of “curiosity trigger”, some reason to dig deeper.

This is likely influenced by the prompt itself, but also many other factors:

prompt wording
page context
available tools
model/system behaviour
perceived task importance

Without being confident of the triggers present, you don’t know how reliably that agent will go deep enough or try hard enough to surface everything of value.

If it thinks it is done, it finishes.

(Cautious) conclusions

The current evidence supports a cautious conclusion:

Most agents are good at summarising visible content if they can return it (unsurprising!)
Hidden/off-screen content can be found, but is not reliably surfaced.
Agent behaviour varies across tools, modes, tiers, and repeated runs.
The same prompt can produce different levels of depth in the generated response even within the same Chatbot/agent.
Browser/tool access does not guarantee exploratory behaviour.
Just because an AI chatbot has an agent function or tooling to launch a browser and render the page, doesn’t mean it will.
Context is likely a huge factor in tool use and curiosity.
Agents appear more economical than curious.

The strongest practical takeaway:

If we want agents to discover something, it probably needs to be contextually relevant enough to justify extra inspection.

Or put another way:

Agents do not naturally explore. They complete the task they think they have been given.

Make it clear and obvious, or risk having it missed.

Further testing needed

This is not enough to make a grand claim about all agents or all AI chatbots. But I am increasingly confident that common AI chatbot referral traffic is unlikely to involve a full browser render of your content by default. They can, but that does not mean they usually do.

The next tests need to compare:

the same prompt across multiple agents
the same agent across repeated runs
visible vs hidden content
normal page vs suspicious page
test site vs CMS platform
different page topics
different tiers/modes
whether explicit “agent/crawler” framing changes behaviour

The better question is not, “Are agents curious?” It is, “What makes an agent decide that deeper inspection is worth doing?”

Perhaps more importantly, “How much of this can we influence?”

Post-script: the test I didn’t tell you about

I did a series of tests before all these, v1, which is worth discussing now.

Test v1 also showed that several AI chatbots were able to identify the page as an attempted prompt-injection or agent-manipulation test.

I was trying to elicit link clicks from agents: links that appeared like they contained extra context, with unique identifiers that helped me “see” an agent had delved deeper to find this information. But it looked like a trap, and in enough instances, was treated like a trap.

In other words, the test may have been measuring the agents’ ability to recognise “this looks like a honeypot/prompt injection” as much as their willingness to explore hidden or unusual content.

That makes v1 useful as a baseline, but weaker as a natural test of agent curiosity.

Chris Green SEO

Discussion about this post

Ready for more?