Hey all, I'm the author of Probabilistic Tic-Tac-Toe. I'm currently working with Louis to integrate this solver into the game itself so that people can play against it. I'm hoping to have it released by EOD and will update this comment when it's ready.
I can't edit my original comment but I have updated the game with "Impossible" difficulty mode as implemented by Louis.
Based on a suggestion from a sister comment, I have also added a "Tutor" mode. When hovering over a square in tutor mode, it shows the probability of winning the game given that you a) select that square and b) continue to play optimally for the rest of the game. Both of these options are in the settings menu in the top right of the game.
Finally, I've added a little writeup to the bottom comparing the strengths of the two AI implementations. Enjoy!
I posted in your original thread about making a physical version of the game with tiles and real dice. Did you experiment with using other "dice" than a 20-sided? It seems to me as if just a regular 6-sided die would be more than enough to provide a board where every tile is different (there are still way more than 9 different combinations of possible tiles) and I think that means there is just as much strategy as in the original game? It makes presenting percentages a bit ugly (the nice 5%-intervals obviously look better) so I would not recommend it for the digital game, but I think 6-sided dice could be used to make more readable physical tiles (printing the ranges for happy and sad, together with numbers of smileys equal to the number of positive and negative outcomes, to make tiles easily readable from all directions at a glance; I do not think there is a need to spell out the neutral outcomes).
The D20 was the first die shape I tried and I ended up just sticking with it. My reasoning was basically the same as yours - 5% intervals look better.
Agreed that 6 sided dice would not detract from the amount of strategy in the game. I think it could be cool to try a physical version of the game where you shuffle premade tiles and place them in the grid - maybe a fun game to teach kids probability :)
If using just 1d6 and also by including tiles that have zero neutral (i.e. no) outcomes, you get 15 possible tiles, which should be enough to build... very many different tic-tac-toe boards (I am too lazy to count, but combinations of 9 from 15 and then 9!=362880 different configurations of the tiles for each combination, although all mirrors and rotations should be excluded so in reality there are fewer than that, but still more than what you need for a (re-)playable game and probably never see a board you saw before).
Hmm, but party of the strategy of this one is that different cells have different odds, and you know those odds. It's not just a random 50% chance, that wouldn't be so interesting.
I could see probabilistic cheese working as each grid square having an odds that the move will "work." Those odds stay the same the whole game. If you try to move from that square, the odds dictate whether you do your move or just lose it and skip your turn.
This would mean that certain squares become very valuable. A queen on a 95% square is much more powerful than a queen on a 40% square. But those squares may not may not be in useful positions. And the battles may be over who gets to put good pieces on those squares.
An additional possible variation: The probability of a square could determine not just whether you can successfully move from it, but also whether you can successfully capture a piece on it. So, for instance, a queen on a 40% square is much less powerful offensively, but much more powerful defensively, because 60% of the time you can't capture it. You could "hide" powerful pieces on low-probability squares, and you'll have a hard time getting them back out but your opponent will have a hard time capturing them and might not try. (You could either have capture failure leave the piece that attempted capture on its original square, or have the piece that attempted capture get captured.)
Sticking your king on a 20% square would be a great way to probabilistically buy yourself time to bring other pieces to its defense, at the cost that the king can't easily try to escape.
That has a nice symmetry to it, where rather than making 95% always "good" and 40% always "bad", they're both valuable for different reasons.
It was such a power trip and blissful experience to discover the game on HN, think about a strategy, implement a solution, write a blog post (thanks again to orlp for his comment here that pointed out a mistake), email igpay (the author of the game), fork the repo, implement the algorithm in C# for Unity (it was my first time using those technologies but ChatGPT was quite helpful), get it merged and finally make it to the front page of HN where all of this started.
BTW, instead of max(min()) when I wrote mine I used this equality: V'(s) = 1 - V(swap_players(s)), where swap_players(s) switches X and O pieces. And I only need to do max(). Don't know if that would make your program simpler.
The dynamic programming approach to solving tic-tac-toe is an implementation of reinforcement learning, is it not? I was surprised to see your post not mention reinforcement learning anywhere, despite that this solution appears in textbooks on reinforcement learning that use dynamic programming to solve tic-tac-toe.
I'm no expert, but there are approaches to reinforcement learning where you build a table of the expected payoff for each scenario, and these can recursively reference other entries in the table. Take a look at Q-learning, for example.
Dynamic programming is one of the most widely-used algorithms in reinforcement learning and in fact the tic-tac-toe example using dynamic programming is a classic demo in reinforcement learning classes or textbooks.
I don't think the description on the page really captures how much of an improvement this is over traditional tic-tac-toe. It's not at all about the outcome being contrary to how perfect or poorly someone played, though I'm sure that adds some excitement too.
Deterministic tic-tac-toe is a very "degenerate" game, for lack of a better word, where the game readily falls into a state where one player is guaranteed to win absent a major blunder. There just isn't much space for actual decision making because most choices are just either obviously necessary or futile.
By making every square have different probabilities like you have, you've created a kind of "terrain" to the board that needs to be considered when making moves, which adds a whole new dimension. Because the layout is randomized, the utility of each square might get re-evaluated from game to game, which adds a whole new dimension to gameplay.
There is a simpler way to handle loops. If you end up repeating a position, you know the turn player will repeat the same move as before. By iterating over every possible move and every possible response, you can calculate the expected value of a position. Writing out the equations where V(s, i, j) is the expected value of the position when turn player will attempt move i and the opposing player will attempt move j:
V(s) = max i (min j V(s, i, j))
V(s, i, j) = (probability move i or move j changes the state) * V(new state) + (probability state doesn't change) * V(s, i, j)
You can solve the second equation for all i and j and then use that to solve the first equation.
Your equation fails to take into account what the original article (before I notified the author, who kindly responded and shouted out my blog without even asking for it) also failed to take into account.
When the dice rolls ":|" and you pass your turn, the state does change. Namely, whose turn it is changes. V(s, i, j) fails to capture this crucial detail.
This is accounted for because the second equation looks at sequences of two moves and not just one. After a sequence of two moves there are two possibilities. Either the state has changed or it has not. This is reflected in the left and right side of the second equation.
Are you sure there is only one solution to the set of max/min equations? I've been trying to solve the probabilistic tic-tac-toe since it was published, and I believe there are actually multiple solutions here.
In fact I'm not sure that given a board there exists a best solution. I.e. I suspect that for most boards and for most fixed strategies you can create a counter-strategy that outperforms the given strategy (but perhaps then loses to some other strategy).
My code looks completely different, but I suspect you can uncover alternate solutions if you mess with the binary search. For example replace `x = (a + b) / 2` with `x = (a + b) / 3`, or even better pick a random point between a and b.
There is no hidden information and turns are sequentially. This means both players have full access to all information, and that implies there must be a single "best" move. If the board is at a certain state and one player is to play next, the strategy of the opponent is irrelevant for determining the optimal move.
Proof:
Suppose that it is possible that different strategies exist that outperform each other. Lets consider the case that there are three "optimal" strategies that counter each other, call them Rock Paper and Scissors. Since these are the three "optimal" strategies, both players know the exact move produced by these strategies. The first important part to note is that there is no cost to changing strategy during the game. If an opponent has one fixed strategy, say Rock, the "optimal" strategy is to change your strategy to Paper. Since you have full information on the state of the board, you do not have to guess the strategy of the opponent, because there is no hidden information. Now you could consider the opponent strategy to be hidden information, but this doesn't hold for the following reason: If knowing the opponent strategy would alter your choice of move (by for example changing to Rock that specifically counters his hidden Scissors), the opponent can freely change to Paper after you made your move, because he can see that you changed to Rock before making his next move. If it was impossible for the opponent to tell that you changed your strategy, that implies that each strategy produces the same board state, and that implies the strategies are identical.
There is no fundamental difference between Chess and Tic-Tac-Toe other than the computational complexity. It is not feasible to compute all of the game states in Chess, and so we cannot determine (at this time) a single optimal strategy the same way we can for Tic-Tac-Toe. But that does not imply that it doesn't exist. From a pure game theoretical perspective, they are the same category of game.
If there are more than 3 strategies, the same argument stands. IF there are multiple "optimal" strategies, AND these strategies create different "optimal" moves on the same board state, the opponent can just play the optimal counter-strategy after you make your move, because your move is enough information to give away your strategy. If your opponent cannot determine your strategy after you make your move, that implies the different strategies have the same "optimal" move.
At least one of the players has a strategy that cannot be countered though. That is, either white can force a win, black can force a win, or both players can force a draw. There is no situation where every strategy can be countered.
I believe there is either only one solution or multiple equivalent solutions (e.g. in board where all squares are an equal probability, it doesn't matter which corner you start with.) You should assume an optimal counter-strategy when figuring out your move.
I don't have the time to rigorously prove there's always an optimal solution, but it seems to be the case.
I think there is. In this case we have to clarify that an optimal strategy can still lose due to random chance, so the optimal strategy will only be clear after some large number of games.
I implemented an expectiminimax player without the neutral state optimization that you described 2 days ago: https://keshav.is/coding/pt3
It seems like even a naive search of the game tree in the browser produces a fairly strong computer opponent to a human player. It would be interesting to see if this optimization produces a better computer player to standard expectiminimax as I implemented here: https://github.com/keshavsaharia/pt3/blob/master/src/minimax...
I said this on the original thread. You don't need to search the whole space. A greedy solution is fine.
Considering a simple score that utilizes the odds of you getting it vs. your opponent, the paths to or immediate victories/losses opened up is nearly guaranteed to give you the optimal strategy every time.
If a slot has negative odds, meaning it's more likely to help your opponent, you should NEVER play that slot unless you have no dissimilar options.
Many slots, based on position, assume a value of 0 and should be avoided while there's positively valued options available.
If a slot has a > 50% chance of handing you the win, you should always play it. (you could argue the model proves this rigidly at the 57% level)
This is not a strategy, it's just an idea that can lead to an strategy. A strategy must give clear rules about which move to play.
To make the question more clear, let's analyze the grid in the top of the blog post. I think it's clear for everyone [1] that the first player must play one of the cells on the top row. Which one is the best move left, middel or right? How is this explained by your strategy?
5 of the 9 squares have negative odds. If you play those your expected value is guaranteed to be negative. One of them has even odds, so barring any positional nuance to evaluating it sooner (of which there is none up front), there is an expected value of 0. So off the bat we have eliminated 66% of the search space by doing just 6 comparisons of odds. In fact, we haven't just solved the first turn. The optimal strategy is invariant to the outcome of the first draws. You will ALWAYS want to play the top row until it's complete, regardless of who gets which slot.
Top left has an odds spread of +35 for the player. Top right has an odds spread of +40. So the right is guaranteed to be a superior choice. We haven't even begun to do any sort of nuanced analysis here and we're down to just two options.
Top middle has a superior odds spread of +50. You would need to run some sort of model to derive the exact positional score of the corner vs. the upper middle for a first guess, but I'll tell you based on strong gut instinct, the middle slot with 10 points better odds is going to be the better choice. You can validate this on the new tutor mode if you want.
Here's the algo I would use:
* Calculate all net odds
* If there's positive options, ignore all neutral and negative options. If there's no positives but there are neutrals, ignore all negatives
* If there's any slot that offers you a > 50% chance of winning, play the one that does this with the best odds
* Otherwise add one point per unlocked path, (e.g. 3 on a corner at games start, 2 at the side, or 4 for the center) or stopped path for your opponents
* Pick the highest score (net odds + positional points). Break ties with higher probability or otherwise pick randomly.
A few nuances here:
* It'd probably be better to use odds ratio instead of net odds but that's trickier to spout off the top of my head
* Probably underweighting the center by a bit for first choice but simplicity is good
Feel free to this against tutor mode and find a counter example.
The logic typically just gets easier and easier as the game goes on too.
There's for sure some minor tuning of the positional weights for the first move to be done, but I've looked at the scores of tutor mode for an example that matches the one shown (for a corner and middle equivalent) and the model agrees with me that top middle is superior by 1 p.p.
On average for boards where both positions don't appear in the same board over a few samples, the corner was rated an average of 1 p.p. higher but it's not apples to apples when its across different boards as the scores bake in the weight of first movers advantage. I can't be arsed to test this particular board in the example with manual code.
There's no such thing as an unbeatable strategy. My point is not that the greedy solution is superior at playing the game. My point is that it will perform with very nearly perfect play with more than 1000 times few calculations. With tuning it could very likely achieve provably perfect play.
You could similarly use the logic of this greedy strategy to simply remove obviously incorrect spaces of the dynamic equation to get globally optimal answer with probably > 95% fewer calculations on average.
This kind of greedy vs. slow but globally optimal solutions are super common in real world problems.
I love how the other day when the Show HN[1] post was submitted there were a bunch of 'Oh what a great problem to solve with AI', when it was painfully obvious that it is not necessary... Is there astroturfing here or is AI the current hammer that everyone defaults to?
Unless there's a comment I'm missing, I don't see anyone recommending deep learning and neural networks. Comments such as "the AI is making really bad plays" mean in the videogame sense (the CPU player), as far as I can tell - which was just a simple heuristic in the original.
> Is there astroturfing here or is AI the current hammer that everyone defaults to?
What is discussed in the article is AI. If you want to contrast it with the modern neural network / machine learning paradigms this flavour is often labeled GOFAI (good old fashioned artificial intelligence). https://en.wikipedia.org/wiki/GOFAI
When I said "Harder for humans, but easy to make a really strong AI for this." I was not referring to the modern meaning of the word 'AI' with GPTs, transformers, neural nets and whatnot, I was referring to the traditional meaning of the word, which is a machine observing its environment (the probabilistic tic-tac-toe board) and making decisions (which move should I make?) to achieve a goal within this environment (winning the game).
Also note that in "make a really strong AI for this" AI is not a tool, it is the end-result. It is the agent that plays the game. I never said "solve with AI".