Agent 01: The Move to Intuition

By Dr. Fang with the Marelus Brothers

2026

The AlphaGo Moment

On March 10, 2016, in Seoul, a machine made a move that looked absurd.

It was Game 2 of the match between Lee Sedol, one of the greatest Go players in history, and AlphaGo, an artificial intelligence system built by DeepMind. The world was watching because this was not supposed to happen yet. Computers had already conquered checkers and humbled chess champions, but Go was different. Go was older, deeper, stranger. It was the game many believed belonged to humans in a more intimate way—a game not just of calculation, but of judgment. Not just logic, but feel.

Then, on move 37, AlphaGo placed a black stone where no master expected.

For a moment, the experts thought it was a mistake. Commentators hesitated. Professional players looked confused. The move sat on the board like an error, or perhaps a provocation. One expert later suggested that perhaps only one human in ten thousand would have considered it. It violated convention. It did not look elegant by the standards of professional Go. It did not look natural.

And yet it was brilliant.

Move 37 did not merely help AlphaGo win the game. It announced that something had changed in artificial intelligence. The machine was no longer behaving like a glorified calculator. It was producing a move that looked, to human eyes, like intuition—except it was an intuition no human had quite possessed.

That was the shock of the AlphaGo moment. Not that a machine had beaten a person. We had seen that before. The shock was that it seemed to see.

Why Go Was the Holy Grail

To understand why that move landed with such force, one must understand what Go represented.

Chess had long served as the benchmark for machine intelligence. It is a game of immense complexity, but it still yields relatively well to explicit search and tactical calculation. A sufficiently powerful machine can examine enormous numbers of candidate continuations, evaluate them with carefully engineered heuristics, and often overpower even the best humans. That was, broadly speaking, the style of classical computer dominance.

Go resists that style.

The board is simple: a nineteen-by-nineteen grid, 361 intersections in total. Two players alternately place black and white stones, trying to surround territory and capture each other’s groups. The rules can be explained in minutes. The mastery can take a lifetime.

What makes Go extraordinary is not the rules but the combinatorial explosion they produce. The number of possible positions is astronomical; the number of possible games is often estimated at more than 10¹⁷⁰. That number is so large that it ceases to feel numerical and becomes almost literary. It is invoked not because anyone can grasp it directly, but because it conveys the central fact: Go is too vast for brute force.

You cannot simply enumerate everything. There are too many legal moves, too many plausible plans, too many long-range interactions across the board. A move in one corner may influence a battle that will not fully emerge for dozens of turns. Strength in Go depends not just on local tactics but on shape, influence, thickness, timing, sacrifice, and balance. Professional players often describe strong moves in language that sounds almost aesthetic. A position feels light or heavy. A sequence has good shape. A group has latent potential. There are concepts, of course, but the concepts are bound up with cultivated pattern recognition.

In other words, Go was widely seen as the domain where human intuition still ruled.

That is why it became the holy grail of AI. If a computer could truly master Go, then perhaps machines were no longer limited to tasks that reward raw speed and explicit logic. Perhaps they could enter the realm of tacit judgment.

How AlphaGo Learned: First from Us, Then Beyond Us

AlphaGo did not begin as a superhuman oracle. It learned in stages.

The first stage was supervised learning. In ordinary terms, this means learning from examples. AlphaGo was shown large numbers of expert human games and trained to predict what strong players would do in a given board position. Presented with a snapshot of the board, it learned to guess the next move a professional would most likely choose.

This mattered for two reasons. First, it gave AlphaGo a starting point. Instead of blundering randomly across the board, it began with a statistical sense of what competent Go looks like. Second, it allowed the system to absorb decades—indeed centuries—of accumulated human practice indirectly through recorded games. In this phase, AlphaGo was, in effect, a model apprentice. It studied the masters.

But imitation has a ceiling.

If AlphaGo had stopped there, it would have remained an extraordinary mimic: a machine that could reproduce patterns from elite human play without ever transcending them. It might have become very strong, but it would still be bounded by the archive of what humans had already discovered.

The real breakthrough came in the second stage: reinforcement learning.

In reinforcement learning, the machine is no longer just copying examples. It is learning from consequences. This is where AlphaGo transforms from a mere pattern-matching algorithm into a true Autonomous Agent. In the language of modern AI, an agent is a system that interacts with an environment through a continuous loop: it observes the current state, reasons about its options, takes an action, and eventually receives a reward.

For AlphaGo, the "environment" is the 19x19 board—a fully observable, deterministic world. Its "observation" is the configuration of black and white stones. Its "action" is placing the next stone. And the "reward" is the ultimate signal: win or lose. That sounds crude compared with the subtlety of the game, but it is powerful. The system does not need to be told explicitly what shape is beautiful or which stones are strategically elegant. It only needs to discover, through repeated play, which patterns tend to lead to victory.

So AlphaGo played against itself.

Again and again. Millions of times.

This self-play was the turning point. It meant the system was no longer merely inheriting human wisdom; it was generating its own experience at superhuman scale. Every game became a lesson. Every win reinforced certain decisions. Every loss weakened others. Over time, AlphaGo refined its judgments, not by being handed abstract principles, but by discovering what actually succeeded.

Supervised learning taught AlphaGo to play like strong humans.

Reinforcement learning taught it how to become stronger than them.

That distinction matters. Supervised learning gave AlphaGo a map of human intuition. Reinforcement learning let it leave the map.

How AlphaGo Decided: Policy, Value, and Search

Learning how to play is one challenge. Choosing a move in the middle of an actual game is another.

Even after training, AlphaGo still faced the fundamental difficulty of Go: there are too many possibilities to examine exhaustively. It needed a way to combine learned intuition with selective calculation. This is where three key ingredients entered the picture: the policy network, the value network, and Monte Carlo Tree Search, usually abbreviated as MCTS.

These sound technical, but their functions are surprisingly intuitive.

The policy network answered a basic question: Which moves look promising here? Given a board position, it produced a probability distribution over possible next moves. In effect, it acted like a learned instinct for candidate actions. On a board with hundreds of legal possibilities, that was invaluable. Rather than treating every move as equally worthy of attention, AlphaGo had a trained sense of where to look first.

The value network answered a different question: How good is this position likely to be? Instead of playing all the way to the end every time it wanted to judge a position, AlphaGo could estimate whether the current board favored black or white. This was like a compressed strategic judgment. The value network did not say merely, “This move is common.” It said, in essence, “This position is likely winning,” or “This position is dangerous.”

Then came Monte Carlo Tree Search.

At first glance, MCTS is just a search method, but that phrase undersells what made it effective. Classical search tries to explore future possibilities by building a tree of moves and countermoves. In a game like Go, that tree is too large to traverse completely. MCTS deals with this by exploring selectively. It expands the most promising branches more deeply, using repeated simulations and accumulated evidence to decide where further attention is worthwhile.

The genius of AlphaGo was that MCTS was not operating blindly. It was guided by the policy network and informed by the value network. The policy network suggested which branches deserved early exploration. The value network helped estimate the promise of the positions encountered. Search, in other words, was no longer separate from learning. It was steered by learned pattern recognition.

This hybrid architecture was the true engine behind Move 37.

AlphaGo was not just calculating faster than a human. Nor was it merely retrieving patterns from memory. It was combining three forms of judgment: a learned prior over good moves, a learned evaluation of positions, and a selective lookahead procedure that concentrated effort where it mattered most.

That is why its play felt so strange. It was not brute force in the old sense, but neither was it mystical. It was structured intuition.

The Move to Intuition

This was the deeper significance of AlphaGo.

For much of its history, AI had been associated in the public imagination with rigid logic, explicit rules, and mechanical search. It was the realm of calculators, theorem provers, and systems that excelled when the world could be cleanly formalized. Human beings, by contrast, were thought to possess something else: an ability to navigate complexity through intuition. We recognized faces instantly. We sensed danger in ambiguous situations. We made judgments we could not always fully explain.

Go seemed to belong to that second category. It was too large for exact enumeration and too subtle for simple rules. To play well required what humans called feel.

AlphaGo showed that such “feel” could, at least in some domains, be engineered through learning.

That does not mean the machine was conscious, soulful, or human-like in any deep philosophical sense. It did not experience the board. It did not admire beauty. It did not know that it was surprising anyone. But functionally, it had acquired something analogous to intuition: the ability to make high-quality decisions in a vast, uncertain space by compressing experience into patterns of judgment.

This is why Move 37 felt almost alien. It was not just good; it was good in a way that eluded the reigning human aesthetic of the moment. AlphaGo had begun by learning from human games, but through reinforcement learning and self-play it had moved beyond human habit. The move bore traces of our knowledge, but not our limits.

That is an unsettling idea.

We often assume that when we can no longer articulate the full reasons for a decision, we have entered the irreducibly human domain of intuition. AlphaGo suggested instead that intuition may sometimes be what intelligence looks like when learning has become so deep and compressed that explicit step-by-step explanation is no longer the whole story.

Not magic. Not omniscience. Learned judgment.

Why This Changed Everything

The board in Seoul was only a board. The stones were only stones. Yet the implications reached far beyond the game itself.

Games are useful proving grounds because they are bounded worlds. The rules are clear, the outcomes measurable, the objective unambiguous. But the lesson of AlphaGo was never just about Go. The deeper lesson was that AI systems could achieve remarkable performance in domains once thought too subtle, too intuitive, or too combinatorially vast for machines.

If a system could first learn from human examples, then improve through reinforcement, then use policy, value, and selective search to discover moves no human would naturally choose, what else might be possible?

What other fields depend on expert pattern recognition built from experience? Medicine, where diagnosis often begins as a hunch before it becomes an explanation. Science, where promising hypotheses must be selected from immense spaces of possibility. Engineering, design, logistics, negotiation, strategy, language. Not every domain is a game, and real-world problems are messier, morally fraught, and far less neatly scored than Go. But after AlphaGo, it became impossible to believe that “machines can calculate, humans can intuit” was a permanent dividing line.

That line had been crossed.

Move 37 was more than a move. It was a signal that AI had entered a new phase. The old image of the computer as a tireless enumerator was no longer sufficient. A new image was emerging: the machine as pattern recognizer, evaluator, explorer. A system that could inherit human knowledge, refine it through autonomous experience, and return with answers that seemed at once recognizable and strange.

That is why the moment endures.

A machine placed a stone where no one expected. The masters frowned. The commentators doubted. And then, slowly, they understood that they were witnessing not an error, but a new kind of intelligence—one trained first by human wisdom, then transformed by self-play into something that could outgrow the boundaries of its teachers.

The move looked wrong because the future often does.