Could you point some out to me? I haven't been deliberately avoiding them by any...

guicho271828 · on July 18, 2017

I bet you already know MCTS. But I see you have a problem with defining "fast". Define "fast". You can usually only say something is faster or slower.

Absolute speed depends on the machine resource and the low-level optimization. If you take MCTS it is "enough fast" considering it has already beat Lee Sedol.

There are multiple means to measure the speed. Blind search is "fast" re: node expansion but takes exponential time to find a solution and quite likely exhausts memory. Search with heavy heuristic functions has "slow" expansion but can solve the problem quickly, despite the cost of evaluating them, since it can prune a large part of the state space and expands fewer nodes. Only the latter is practically meaningful.

Just read the classical AI textbooks like AIMA first, then [this one](https://www.amazon.co.uk/Heuristic-Search-Applications-Stefa...) if you want to know more. Maybe [this short paper](https://www.cs.cmu.edu/~maxim/files/searchcomesofage_aaai12_...) may give you the gist.

Radim · on July 19, 2017

On the topic of "likely exhausts memory":

The best search methods allow reversible state updates. The reversibility makes things super-fast -- you no longer need to copy all data structures on path expansion. Instead, you modify only a single representation, incrementally. And when retracting a search step, you "undo" the same modifications again, arriving at the exact same state as when you started.

This is of course non-trivial -- it is much easier to copy everything, then throw away the entire copy when it's not needed, rather than keeping a single state incrementally consistent. But the effects due to data locality (excellent caching), better memory management (no allocations, fragmentation) and less work (only touch and update parts of the state that matter) can be tremendous.

I haven't seen any discussion of DeepMind's implementation details for AlphaGo, but since they come from a game development background (David Silver was the CTO and lead dev at Elixir Studios), where each cycle counts, I have no doubt they're well familiar with all these concepts. But then the TPU throws a wrench into it again...

guicho271828 · on July 19, 2017

That idea is nearly 30 yrs old, it is the whole point of IDA* (Korf 85) and maybe Frontier Search (Korf 99) too. Their benefits, drawbacks and the cure are all well studied. I'm surprised some people who discuss AI do not know these famous algorithms.

MCTS's playouts do not need to backtrack (they are just the greedy probes), so it is irrelevant. By backtrack, do not confuse it with the backpropagaton in MCTS.

I do not see the connection to TPU.

Radim · on July 19, 2017

Reversible state updates have nothing to do with IDA.

The concept is completely orthogonal to choosing which node (next state) to expand next. It's about managing the internal representation of a state.

guicho271828 · on July 20, 2017

OK, reversible updates. You need to somehow store the parent state information in order to backtrack. And the natural way to do this is to push the edge/diff info in the stack. By doing so you only have to maintain one state at a time, keeping the backtrack information compact. But when to backtrack? You surely use iterative deepening, otherwise it returns a suboptimal path or may not backtrack forever on the state space graph. You also want to prune the states? Use the sum of the depth and the heuristic lower bound. This is IDA＊. You can call a blind IDA＊ as IDDFS too, but I generally just call them IDA＊. A plain DFS is not viable in complex problems, so let's forget about it.

> choosing which node (next state) to expand next

I know this is "heuristic search", a broader class of algorithms which includes A＊ and IDA＊ and many more. But the core selling point of IDA＊ is not the "heuristic search", but its compact linear-space memory usage (thus fast) compared to A＊. So I did not suggested IDA＊ because of its heuristic search aspect.

For an efficient implementation, reversible updates are just natural and common in IDA＊. Below is an excerpt from [1]. I believe this is equivalent to what you call reversible state updates.

    * In-place Modification

    The main source of remaining overhead in the expansion function is
    the copy performed when generating each new successor (line 12 in Figure 1).
    Since IDA＊ only considers a single node for expansion at a time, and because
    our recursive implementation of depth-first search always backtracks to the
    parent before considering the next successor, it is possible to avoid copying entirely.
    We will use an optimization called in-place modification where the search 
    only uses a single state that is modified to generate each successor and
    reverted upon backtracking.

[1] Burns, E. A., Hatem, M., Leighton, M. J., & Ruml, W. (2012, July). "Implementing fast heuristic search code". In Fifth Annual Symposium on Combinatorial Search.

Radim · on July 20, 2017

Yep, that's the one.

By the way, this "local-changes-only" approach crops up in many other places too. In CS and nature, because it's just so damn energy-efficient.

For example, check out this (recent, July 2017) paper:

Gomez, Ren, Urtasun & Grosse: "The Reversible Residual Network: Backpropagation Without Storing Activations", https://arxiv.org/abs/1707.04585.

You can't help but think of the ancient programmer's trick for swapping two variables without a temp storage...

  a = a + b;
  b = a - b;
  a = a - b;