#1: Kuhn Poker | Challenge (Part 2)
Challenge Part 2: Automatic Solver
This section is locked until you complete part 1.
The strategy that you used to unlock this stage has been submitted to the leaderboard; you can see it as name [undefined] below. You can resubmit to replace it with another strategy as many times as you like using the “Submit” button below. The preceding sentences will be true when the challenge goes live, expect Tuesday or Wednesday.
Next, use the automatic solver tool below to refine your strategies for each player in terms of the fixed action probabilities at each player’s infosets. For this challenge, we recommend that you submit strategies that form a Nash equilibrium (or equivalently, a pair of strategies such that neither player has regret).
Once you have a strategy that you believe improves on your current submission, you can re-submit and wait for the results to be re-run (which may take some minutes).
- The site should save your progress if you navigate away or refresh, though might lose the last few edits, depending.
- It doesn’t have any help for sharing solutions between teammates, sorry.
- At each iteration, the solver will update all 12 nodes (in some arbitrary order), using a rule modified by the update parameters:
- a magnitude multiplier
- how to scale the update based on EV (currently supports: no effect or linearly)
- how to scale the update based on the infoset’s visit probability (currently supports: no effect or linearly)
- whether to use a learning rate to decay the magnitude over time (currently supports: no decay, or linear in the sum of updates made to this infoset since reset)
- whether to update probability or odds
- You can set it to run for a number of iterations. (If you accidentally set it to too many, you can stop the solver by pressing “Stop” or by reloading the page.)
- Speed is hopefully self-explanatory.
- Tolerance controls the difference between action EVs that is too small to update on.
(this slows the solver down somewhat)
Strategy probabilities
Strategy | A_ | K_ | Q_ | _A↑ | _A↓ | _K↑ | _K↓ | _Q↑ | _Q↓ | A_↓↑ | K_↓↑ | Q_↓↑ |
---|
EV
EV | A_ | K_ | Q_ | _A↑ | _A↓ | _K↑ | _K↓ | _Q↑ | _Q↓ | A_↓↑ | K_↓↑ | Q_↓↑ |
---|
Other
(this slows the solver down significantly)
EV(action=↑Up)
EV(action=↑Up) | A_ | K_ | Q_ | _A↑ | _A↓ | _K↑ | _K↓ | _Q↑ | _Q↓ | A_↓↑ | K_↓↑ | Q_↓↑ |
---|
EV(action=↓Down)
EV(action=↓Down) | A_ | K_ | Q_ | _A↑ | _A↓ | _K↑ | _K↓ | _Q↑ | _Q↓ | A_↓↑ | K_↓↑ | Q_↓↑ |
---|
Visit probabilities
Visit Prob | A_ | K_ | Q_ | _A↑ | _A↓ | _K↑ | _K↓ | _Q↑ | _Q↓ | A_↓↑ | K_↓↑ | Q_↓↑ |
---|
Total Updates
Total Updates | A_ | K_ | Q_ | _A↑ | _A↓ | _K↑ | _K↓ | _Q↑ | _Q↓ | A_↓↑ | K_↓↑ | Q_↓↑ |
---|
The solver’s algorithm is a reinforcement-learning approach that you will use some version of for the rest of the course. Unfortunately, the vanilla version that this page defaults to doesn’t converge.
The solver controls will let you tweak how the algorithm determines the size of the updates, which is critical to having the convergence behavior you want. Try to find a setting of the controls that converges to a good solution.
For this task you will almost certainly want to look at the history of updates represented in the “Solver history” box above.
While Kuhn Poker is small enough to solve by hand or by manual trial-and-error, having an efficient and effective (and converging!) algorithm for learning better strategies is going to be key in later weeks.
This week’s challenge (submit a strategy) has drifted apart somewhat from the lesson we tried to build up to (exploring the nuances of reinforcement-update algorithms). Our current thinking is that this would be better if the challenge were actually to submit 100-card Kuhn Poker, which would do a better job of applying the answer to “how does a good solver update?”
Unfortunately, we ran out of time to implement the 100-card Kuhn Poker train / test tournament.
Submit to leaderboard
Re-running the tournament currently takes between one and three minutes, and there’s no indication that it’s done except the board changing. In some cases, you may have to refresh the page to see changes. Working on improvements.
If you’ve finished all of the above, we’d like to hear about it (and any questions you still have—which we expect you do. Then you can do any of:
- Wait for next week’s material next week.
- Help other students with their confusions and stuck points (and let us know how we could have improved!).
- Get a start on the next segment of the course by writing a bot that can learn from your opponent’s moves and do better than Nash against them. (For this, see the instructors for info on setting up the games environment on your own computer.)