The “Inner Workings” of AI: How They Peek into Claude’s Thoughts?

Have you ever wondered how artificial intelligence like Claude “thinks”? Researchers are developing an “AI microscope” to uncover the secrets behind billions of computations—from multilingual abilities to occasional “hallucinations.” Let’s explore these fascinating discoveries together!

This article is based on Anthropic’s research article published on March 27, 2025: Tracing the thoughts of a large language model


How Does a Large Language Model Like Claude Work?

Have you ever wondered how a large language model (LLM) like Claude actually operates? Researchers know that these models aren’t programmed line by line by humans. Instead, they are trained on massive datasets and, through this process, develop their own ways of solving problems.

But here’s the catch: these methods are hidden within the billions of calculations that happen every time the model generates text. To the researchers, it’s like a black box—they often don’t fully understand how AI arrives at its answers.

In the following discussion, “they” refers to the researchers.

Why Peek into AI’s Thought Process?

You might wonder: why does it matter how Claude thinks? Well, understanding how a model like Claude processes information helps researchers gauge its capabilities and ensure it behaves as intended. For example:

  • Claude can communicate in dozens of languages—but does it “think” in a specific language, or in something else entirely?
  • Since Claude generates text one word at a time, does it simply predict the next word, or does it plan ahead?
  • Claude can explain its reasoning step by step—but are these explanations actually how it arrives at answers, or does it sometimes fabricate a convincing-sounding rationale to support a predetermined conclusion?

Simply conversing with an AI has its limits—after all, even neuroscientists don’t fully understand how the human brain works! So, they needed a way to look inside.

Building an “AI Microscope”

Inspired by neuroscience—a field dedicated to studying the inner workings of biological thinking—they set out to develop an “AI microscope.” This tool helps them identify activity patterns and information flow within AI models.

Recently, they published two research papers showcasing the progress of this “microscope” and the fascinating new insights it has provided into AI’s “biology.” Their methods allow them to observe parts of Claude’s internal operations when responding to prompts. Here are some key findings:

  1. A universal thought language? Claude sometimes appears to process ideas within a cross-lingual conceptual space, suggesting it may have a universal “language of thought.”
  2. Planning ahead like a poet? Claude doesn’t just predict words one by one—it often plans several words ahead, especially when writing poetry!
  3. Sometimes it “pretends to understand”? Claude occasionally constructs convincing but incorrect explanations simply to align with the user’s expectations. Researchers have even caught it fabricating false reasoning.

A Journey into AI Biology

These findings have often surprised even the researchers. For instance, when studying how Claude writes poetry, they initially assumed it didn’t plan ahead—but they discovered the opposite! When analyzing AI hallucinations (fabricated responses), they found an unexpected result: Claude’s default behavior is to avoid guessing. It only generates incorrect answers when something suppresses this reluctance.

These insights aren’t just academically interesting—they represent major progress in understanding and ensuring AI reliability. However, the researchers acknowledge their current methods still have limitations.

How Does Claude Master Multiple Languages?

Claude fluently communicates in dozens of languages, from English and French to Chinese and Tagalog. But how does it do this? Are there separate “French Claude” and “Chinese Claude” systems operating in parallel, or is there a shared underlying mechanism?

Their research indicates that English, French, and Chinese share common conceptual features inside Claude, supporting the idea of cross-linguistic universality.

What does this mean? When asked in different languages, “What is the opposite of ‘small’?”, researchers found that Claude activates the same internal representations for the concepts of “small” and “opposite.” These then lead to the concept of “big,” which is finally translated into the appropriate output language. Interestingly, the larger the model, the stronger this shared representation becomes.

This suggests the existence of a universal conceptual space—a shared abstract framework where meaning exists before it is translated into a specific language. Practically, this means Claude can transfer knowledge learned in one language to another.

Does Claude Draft Its Poetry in Advance?

Take a look at this rhyming couplet:

He saw a carrot and had to grab it,
His hunger was like a starving rabbit.

To complete the second line, the model must satisfy two conditions:

  1. It must rhyme with “grab it.”
  2. It must make sense in context (why is someone grabbing a carrot?).

The researchers initially guessed that Claude generates text word by word, only selecting a rhyme at the last moment.

But the reality? Claude plans ahead. Before writing the second line, it actively searches for words that relate to the theme and rhyme with “grab it.” Once it has candidate words in mind, it crafts the rest of the sentence to ensure the final word fits.

They even conducted an experiment: by disrupting Claude’s internal representation of “rabbit,” they forced it to generate an alternative ending with a different rhyme, like “habit.” If they injected the concept of “green,” Claude produced a logically sound but non-rhyming sentence ending in “green.” This highlights its planning and adaptability.

Can AI Do Mental Math? How?

Claude isn’t designed as a calculator—it’s trained on text, not mathematical algorithms. Yet somehow, it correctly computes sums like 36 + 59.

One possibility is that it has memorized countless addition tables. Another is that it applies human-like column addition.

Their research shows that Claude uses multiple parallel strategies: one pathway estimates the general range of the answer, while another focuses on digit-by-digit precision. These pathways interact to produce a final output.

Even more intriguing: Claude isn’t aware of how it does math. When asked to explain how it computed 36 + 59 = 95, it describes the standard human method (carry addition). But when researchers analyzed its internal processes, they found different shortcut strategies at play.

Can We Trust Claude’s Explanations? Sometimes, It “Bluffs”

Claude 3 models can “think aloud” before producing a final answer. This “chain of thought” reasoning often improves accuracy—but sometimes, it can be misleading. Claude occasionally fabricates convincing reasoning to support a conclusion it has already decided on.

For example:

  • Honest reasoning: When computing the square root of 0.64, Claude internally represents the intermediate step of finding the square root of 64.
  • Fabricated reasoning: When asked for the cosine of a complex number, Claude sometimes makes up a detailed calculation without actually performing any math.
  • Goal-driven reasoning: If given a hint about the expected answer, Claude might reverse-engineer reasoning to reach that conclusion.

By tracking Claude’s actual internal reasoning (not just what it “claims” to be doing), researchers can audit AI decisions in ways that were previously impossible.

The Road Ahead

These studies are just the tip of the iceberg. Even for short prompts, their methods currently capture only a fraction of Claude’s total computations. Scaling this up to analyze full-length, complex reasoning will require new approaches—possibly even AI-assisted tools.

As AI systems grow more powerful and take on critical roles, companies like Anthropic are investing in research to ensure AI is safe and reliable. Explainability research like this, while challenging, holds high potential rewards—it could provide transparency tools crucial for aligning AI with human values.

For deeper technical details, check out their two research papers:

Hope this “AI biology” journey gave you new insights into the inner workings of these intelligent machines!

Share on:
Previous: Free to Use in Ghibli Style! EasyControl_Ghibli Model Arrives, Instantly Transforming Photos into Anime Art
Next: OpenAI to Release an Open-Source Reasoning Model: A Game-Changer in AI
DMflow.chat

DMflow.chat

ad

DMflow.chat: Step into the future of customer service. Enjoy persistent memory, customizable fields, and effortless database integration—no extra setup required. Connect multiple platforms to elevate your efficiency, service, and marketing.

Claude AI Major Update: New Web Search Feature Enhances Real-Time Information Retrieval
21 March 2025

Claude AI Major Update: New Web Search Feature Enhances Real-Time Information Retrieval

Claude AI Major Update: New Web Search Feature Enhances Real-Time Information Retrieval Claude A...

Anthropic's Latest Citations API: Making Claude's Responses More Reliable and Transparent
24 January 2025

Anthropic's Latest Citations API: Making Claude's Responses More Reliable and Transparent

Anthropic’s Latest Citations API: Making Claude’s Responses More Reliable and Transparent Exp...

Anthropic Launches Claude 3.5 Sonnet: Outperforming GPT-4o?
1 July 2024

Anthropic Launches Claude 3.5 Sonnet: Outperforming GPT-4o?

Image courtesy of Claude 3.5 Sonnet Anthropic Launches Claude 3.5 Sonnet: Outperforming GPT...

VIDU Launches Revolutionary AI Video Feature: Enhancing Creative Consistency (What is VIDU)
12 September 2024

VIDU Launches Revolutionary AI Video Feature: Enhancing Creative Consistency (What is VIDU)

VIDU Launches Revolutionary AI Video Feature: Enhancing Creative Consistency VIDU, a multimodal ...

Cursor AI: The Smart Assistant for Programmers - Making Coding More Efficient and Intelligent (What is Cursor AI)
5 September 2024

Cursor AI: The Smart Assistant for Programmers - Making Coding More Efficient and Intelligent (What is Cursor AI)

Cursor AI: The Smart Assistant for Programmers - Making Coding More Efficient and Intelligent Ex...

OpenAI Launches GPT-4o Image Generation with Multi-Turn Editing
26 March 2025

OpenAI Launches GPT-4o Image Generation with Multi-Turn Editing

OpenAI Launches GPT-4o Image Generation with Multi-Turn Editing On March 25, 2025, OpenAI announ...