Solving Probabilistic Tic-Tac-Toe

igpay · on June 13, 2024

Hey all, I'm the author of Probabilistic Tic-Tac-Toe. I'm currently working with Louis to integrate this solver into the game itself so that people can play against it. I'm hoping to have it released by EOD and will update this comment when it's ready.

igpay · on June 14, 2024

I can't edit my original comment but I have updated the game with "Impossible" difficulty mode as implemented by Louis.

Based on a suggestion from a sister comment, I have also added a "Tutor" mode. When hovering over a square in tutor mode, it shows the probability of winning the game given that you a) select that square and b) continue to play optimally for the rest of the game. Both of these options are in the settings menu in the top right of the game.

Finally, I've added a little writeup to the bottom comparing the strengths of the two AI implementations. Enjoy!

https://www.csun.io/2024/06/08/probabilistic-tic-tac-toe.htm...

livrem · on June 14, 2024

I posted in your original thread about making a physical version of the game with tiles and real dice. Did you experiment with using other "dice" than a 20-sided? It seems to me as if just a regular 6-sided die would be more than enough to provide a board where every tile is different (there are still way more than 9 different combinations of possible tiles) and I think that means there is just as much strategy as in the original game? It makes presenting percentages a bit ugly (the nice 5%-intervals obviously look better) so I would not recommend it for the digital game, but I think 6-sided dice could be used to make more readable physical tiles (printing the ranges for happy and sad, together with numbers of smileys equal to the number of positive and negative outcomes, to make tiles easily readable from all directions at a glance; I do not think there is a need to spell out the neutral outcomes).

igpay · on June 14, 2024

The D20 was the first die shape I tried and I ended up just sticking with it. My reasoning was basically the same as yours - 5% intervals look better.

Agreed that 6 sided dice would not detract from the amount of strategy in the game. I think it could be cool to try a physical version of the game where you shuffle premade tiles and place them in the grid - maybe a fun game to teach kids probability :)

livrem · on June 14, 2024

If using just 1d6 and also by including tiles that have zero neutral (i.e. no) outcomes, you get 15 possible tiles, which should be enough to build... very many different tic-tac-toe boards (I am too lazy to count, but combinations of 9 from 15 and then 9!=362880 different configurations of the tiles for each combination, although all mirrors and rotations should be excluded so in reality there are fewer than that, but still more than what you need for a (re-)playable game and probably never see a board you saw before).

317070 · on June 13, 2024

I'm actually interested in being coached by the solver, so it can tell me when I made a suboptimal move and what the optimal move would have been.

Like in chess, playing against stockfish isn't very fun, but having it as a way to get better is fun.

igpay · on June 14, 2024

Hey there, I thought this was a good idea so I added it to the game as "Tutor" mode. Thanks!

317070 · on June 14, 2024

Thanks, this was very interesting. I was happy to see that my play was indeed near optimal.

TarasBob · on June 14, 2024

Have you considered making probabilistic Chess or Go? One way to do it is before each move a coin is flipped to decide who’s turn it is.

SamBam · on June 14, 2024

Hmm, but party of the strategy of this one is that different cells have different odds, and you know those odds. It's not just a random 50% chance, that wouldn't be so interesting.

I could see probabilistic cheese working as each grid square having an odds that the move will "work." Those odds stay the same the whole game. If you try to move from that square, the odds dictate whether you do your move or just lose it and skip your turn.

This would mean that certain squares become very valuable. A queen on a 95% square is much more powerful than a queen on a 40% square. But those squares may not may not be in useful positions. And the battles may be over who gets to put good pieces on those squares.

JoshTriplett · on June 14, 2024

An additional possible variation: The probability of a square could determine not just whether you can successfully move from it, but also whether you can successfully capture a piece on it. So, for instance, a queen on a 40% square is much less powerful offensively, but much more powerful defensively, because 60% of the time you can't capture it. You could "hide" powerful pieces on low-probability squares, and you'll have a hard time getting them back out but your opponent will have a hard time capturing them and might not try. (You could either have capture failure leave the piece that attempted capture on its original square, or have the piece that attempted capture get captured.)

Sticking your king on a 20% square would be a great way to probabilistically buy yourself time to bring other pieces to its defense, at the cost that the king can't easily try to escape.

That has a nice symmetry to it, where rather than making 95% always "good" and 40% always "bad", they're both valuable for different reasons.

nick7376182 · on June 14, 2024

Like chess on a Scrabble board

Labo333 · on June 14, 2024

Author here!

It was such a power trip and blissful experience to discover the game on HN, think about a strategy, implement a solution, write a blog post (thanks again to orlp for his comment here that pointed out a mistake), email igpay (the author of the game), fork the repo, implement the algorithm in C# for Unity (it was my first time using those technologies but ChatGPT was quite helpful), get it merged and finally make it to the front page of HN where all of this started.

Thank you guys, this is why the Internet exists!

yshui · on June 14, 2024

Congrats! :)

BTW, instead of max(min()) when I wrote mine I used this equality: V'(s) = 1 - V(swap_players(s)), where swap_players(s) switches X and O pieces. And I only need to do max(). Don't know if that would make your program simpler.

jb1991 · on June 14, 2024

The dynamic programming approach to solving tic-tac-toe is an implementation of reinforcement learning, is it not? I was surprised to see your post not mention reinforcement learning anywhere, despite that this solution appears in textbooks on reinforcement learning that use dynamic programming to solve tic-tac-toe.

sdwr · on June 14, 2024

I don't think so. Reinforcement learning is about teaching by pairing correct actions with rewards, like giving a dog a treat for doing a trick.

Dynamic programming is a recursive algorithm that caches partial results (from the bottom up).

Apples and oranges

lrschaeffer · on June 14, 2024

I'm no expert, but there are approaches to reinforcement learning where you build a table of the expected payoff for each scenario, and these can recursively reference other entries in the table. Take a look at Q-learning, for example.

jb1991 · on June 14, 2024

Dynamic programming is one of the most widely-used algorithms in reinforcement learning and in fact the tic-tac-toe example using dynamic programming is a classic demo in reinforcement learning classes or textbooks.

harpiaharpyja · on June 14, 2024

I don't think the description on the page really captures how much of an improvement this is over traditional tic-tac-toe. It's not at all about the outcome being contrary to how perfect or poorly someone played, though I'm sure that adds some excitement too.

Deterministic tic-tac-toe is a very "degenerate" game, for lack of a better word, where the game readily falls into a state where one player is guaranteed to win absent a major blunder. There just isn't much space for actual decision making because most choices are just either obviously necessary or futile.

By making every square have different probabilities like you have, you've created a kind of "terrain" to the board that needs to be considered when making moves, which adds a whole new dimension. Because the layout is randomized, the utility of each square might get re-evaluated from game to game, which adds a whole new dimension to gameplay.

malisper · on June 14, 2024

There is a simpler way to handle loops. If you end up repeating a position, you know the turn player will repeat the same move as before. By iterating over every possible move and every possible response, you can calculate the expected value of a position. Writing out the equations where V(s, i, j) is the expected value of the position when turn player will attempt move i and the opposing player will attempt move j:

  V(s) = max i (min j V(s, i, j))
  V(s, i, j) = (probability move i or move j changes the state) * V(new state) + (probability state doesn't change) * V(s, i, j)

You can solve the second equation for all i and j and then use that to solve the first equation.

orlp · on June 14, 2024

Your equation fails to take into account what the original article (before I notified the author, who kindly responded and shouted out my blog without even asking for it) also failed to take into account.

When the dice rolls ":|" and you pass your turn, the state does change. Namely, whose turn it is changes. V(s, i, j) fails to capture this crucial detail.

malisper · on June 14, 2024

This is accounted for because the second equation looks at sequences of two moves and not just one. After a sequence of two moves there are two possibilities. Either the state has changed or it has not. This is reflected in the left and right side of the second equation.

Labo333 · on June 14, 2024

Author here!

I'm not sure about the first part

  V(s) = max i (min j V(s, i, j))

It looks like you suppose that j "precommits" to some move. But I'm not fully awake yet so you could be right.

H8crilA · on June 13, 2024

Are you sure there is only one solution to the set of max/min equations? I've been trying to solve the probabilistic tic-tac-toe since it was published, and I believe there are actually multiple solutions here.

In fact I'm not sure that given a board there exists a best solution. I.e. I suspect that for most boards and for most fixed strategies you can create a counter-strategy that outperforms the given strategy (but perhaps then loses to some other strategy).

My code looks completely different, but I suspect you can uncover alternate solutions if you mess with the binary search. For example replace `x = (a + b) / 2` with `x = (a + b) / 3`, or even better pick a random point between a and b.

Ruphin · on June 13, 2024

There is no hidden information and turns are sequentially. This means both players have full access to all information, and that implies there must be a single "best" move. If the board is at a certain state and one player is to play next, the strategy of the opponent is irrelevant for determining the optimal move.

Proof:

Suppose that it is possible that different strategies exist that outperform each other. Lets consider the case that there are three "optimal" strategies that counter each other, call them Rock Paper and Scissors. Since these are the three "optimal" strategies, both players know the exact move produced by these strategies. The first important part to note is that there is no cost to changing strategy during the game. If an opponent has one fixed strategy, say Rock, the "optimal" strategy is to change your strategy to Paper. Since you have full information on the state of the board, you do not have to guess the strategy of the opponent, because there is no hidden information. Now you could consider the opponent strategy to be hidden information, but this doesn't hold for the following reason: If knowing the opponent strategy would alter your choice of move (by for example changing to Rock that specifically counters his hidden Scissors), the opponent can freely change to Paper after you made your move, because he can see that you changed to Rock before making his next move. If it was impossible for the opponent to tell that you changed your strategy, that implies that each strategy produces the same board state, and that implies the strategies are identical.

blharr · on June 13, 2024

Aren't the same conditions true for chess, though there is no proof that a best strategy exists?

What if it's not just rock/paper/scissors, and there are more than 3 strategies?

Ruphin · on June 13, 2024

There is no fundamental difference between Chess and Tic-Tac-Toe other than the computational complexity. It is not feasible to compute all of the game states in Chess, and so we cannot determine (at this time) a single optimal strategy the same way we can for Tic-Tac-Toe. But that does not imply that it doesn't exist. From a pure game theoretical perspective, they are the same category of game.

If there are more than 3 strategies, the same argument stands. IF there are multiple "optimal" strategies, AND these strategies create different "optimal" moves on the same board state, the opponent can just play the optimal counter-strategy after you make your move, because your move is enough information to give away your strategy. If your opponent cannot determine your strategy after you make your move, that implies the different strategies have the same "optimal" move.

sobellian · on June 13, 2024

At least one of the players has a strategy that cannot be countered though. That is, either white can force a win, black can force a win, or both players can force a draw. There is no situation where every strategy can be countered.

GistNoesis · on June 13, 2024

The article give the bellman equation that you need to solve.

V(s)=max c sc×V′(s+c) + fc×V′(s−c) + nc×V′(s)

V′(s)=min c sc×V(s−c) + fc×V(s+c) + nc×V(s)

You have to solve for all states simultaneously.

One way to solve them is by using fixed point iteration. aka successive approximation.

Pick some small alpha and update the value function in a gradient ascent kind of way.

Vi+1(s) = (1-alpha)Vi(s)+alpha ( max c sc×V′i(s+c)+ fc×V′i(s−c) + nc×V′i(s) )

V′i+1(s) = (1-alpha)V'i(s)+ alpha ( min c sc×Vi(s−c) + fc×Vi(s+c) + nc×Vi(s) )

https://en.wikipedia.org/wiki/Fixed-point_iteration

HWR_14 · on June 13, 2024

I believe there is either only one solution or multiple equivalent solutions (e.g. in board where all squares are an equal probability, it doesn't matter which corner you start with.) You should assume an optimal counter-strategy when figuring out your move.

I don't have the time to rigorously prove there's always an optimal solution, but it seems to be the case.

salamo · on June 13, 2024

I think there is. In this case we have to clarify that an optimal strategy can still lose due to random chance, so the optimal strategy will only be clear after some large number of games.

primitivesuave · on June 13, 2024

I implemented an expectiminimax player without the neutral state optimization that you described 2 days ago: https://keshav.is/coding/pt3

It seems like even a naive search of the game tree in the browser produces a fairly strong computer opponent to a human player. It would be interesting to see if this optimization produces a better computer player to standard expectiminimax as I implemented here: https://github.com/keshavsaharia/pt3/blob/master/src/minimax...

spywaregorilla · on June 14, 2024

I said this on the original thread. You don't need to search the whole space. A greedy solution is fine.

Considering a simple score that utilizes the odds of you getting it vs. your opponent, the paths to or immediate victories/losses opened up is nearly guaranteed to give you the optimal strategy every time.

If a slot has negative odds, meaning it's more likely to help your opponent, you should NEVER play that slot unless you have no dissimilar options.

Many slots, based on position, assume a value of 0 and should be avoided while there's positively valued options available.

If a slot has a > 50% chance of handing you the win, you should always play it. (you could argue the model proves this rigidly at the 57% level)

9 calculations are all you need

gus_massa · on June 14, 2024

This is not a strategy, it's just an idea that can lead to an strategy. A strategy must give clear rules about which move to play.

To make the question more clear, let's analyze the grid in the top of the blog post. I think it's clear for everyone [1] that the first player must play one of the cells on the top row. Which one is the best move left, middel or right? How is this explained by your strategy?

[1] citation needed

spywaregorilla · on June 14, 2024

Well let's start with the [1]

5 of the 9 squares have negative odds. If you play those your expected value is guaranteed to be negative. One of them has even odds, so barring any positional nuance to evaluating it sooner (of which there is none up front), there is an expected value of 0. So off the bat we have eliminated 66% of the search space by doing just 6 comparisons of odds. In fact, we haven't just solved the first turn. The optimal strategy is invariant to the outcome of the first draws. You will ALWAYS want to play the top row until it's complete, regardless of who gets which slot.

Top left has an odds spread of +35 for the player. Top right has an odds spread of +40. So the right is guaranteed to be a superior choice. We haven't even begun to do any sort of nuanced analysis here and we're down to just two options.

Top middle has a superior odds spread of +50. You would need to run some sort of model to derive the exact positional score of the corner vs. the upper middle for a first guess, but I'll tell you based on strong gut instinct, the middle slot with 10 points better odds is going to be the better choice. You can validate this on the new tutor mode if you want.

Here's the algo I would use:

* Calculate all net odds * If there's positive options, ignore all neutral and negative options. If there's no positives but there are neutrals, ignore all negatives

* If there's any slot that offers you a > 50% chance of winning, play the one that does this with the best odds

* Otherwise add one point per unlocked path, (e.g. 3 on a corner at games start, 2 at the side, or 4 for the center) or stopped path for your opponents

* Pick the highest score (net odds + positional points). Break ties with higher probability or otherwise pick randomly.

A few nuances here: * It'd probably be better to use odds ratio instead of net odds but that's trickier to spout off the top of my head

* Probably underweighting the center by a bit for first choice but simplicity is good

Feel free to this against tutor mode and find a counter example.

The logic typically just gets easier and easier as the game goes on too.

gus_massa · on June 15, 2024

I think it's a good strategy, but it's not the best unbeatable strategy.

In particular, in this example it'better to play top right than top middle.

spywaregorilla · on June 15, 2024

There's for sure some minor tuning of the positional weights for the first move to be done, but I've looked at the scores of tutor mode for an example that matches the one shown (for a corner and middle equivalent) and the model agrees with me that top middle is superior by 1 p.p.

On average for boards where both positions don't appear in the same board over a few samples, the corner was rated an average of 1 p.p. higher but it's not apples to apples when its across different boards as the scores bake in the weight of first movers advantage. I can't be arsed to test this particular board in the example with manual code.

There's no such thing as an unbeatable strategy. My point is not that the greedy solution is superior at playing the game. My point is that it will perform with very nearly perfect play with more than 1000 times few calculations. With tuning it could very likely achieve provably perfect play.

You could similarly use the logic of this greedy strategy to simply remove obviously incorrect spaces of the dynamic equation to get globally optimal answer with probably > 95% fewer calculations on average.

This kind of greedy vs. slow but globally optimal solutions are super common in real world problems.

fallingknife · on June 14, 2024

I played 20 games and only 1 was a draw. Interesting how this ends up being so rare compared to normal tic-tac-toe,

wsycharles0o · on June 14, 2024

I must say I'm so proud of the HN community for all the passion and curiosity for a problem like this!

phoronixrly · on June 13, 2024

I love how the other day when the Show HN[1] post was submitted there were a bunch of 'Oh what a great problem to solve with AI', when it was painfully obvious that it is not necessary... Is there astroturfing here or is AI the current hammer that everyone defaults to?

[1] https://news.ycombinator.com/item?id=40635397

Ukv · on June 13, 2024

Unless there's a comment I'm missing, I don't see anyone recommending deep learning and neural networks. Comments such as "the AI is making really bad plays" mean in the videogame sense (the CPU player), as far as I can tell - which was just a simple heuristic in the original.

furyofantares · on June 13, 2024

This is AI. At least in the classic sense of the word, which includes "search" as a broad category.

I do see a lot of people using AI to mean specifically and only neural networks lately. I think this new use of the term is causing some confusion.

krisoft · on June 14, 2024

> Is there astroturfing here or is AI the current hammer that everyone defaults to?

What is discussed in the article is AI. If you want to contrast it with the modern neural network / machine learning paradigms this flavour is often labeled GOFAI (good old fashioned artificial intelligence). https://en.wikipedia.org/wiki/GOFAI

orlp · on June 14, 2024

When I said "Harder for humans, but easy to make a really strong AI for this." I was not referring to the modern meaning of the word 'AI' with GPTs, transformers, neural nets and whatnot, I was referring to the traditional meaning of the word, which is a machine observing its environment (the probabilistic tic-tac-toe board) and making decisions (which move should I make?) to achieve a goal within this environment (winning the game).

Also note that in "make a really strong AI for this" AI is not a tool, it is the end-result. It is the agent that plays the game. I never said "solve with AI".