#2: Leduc Poker | Class Materials
Solving Poker Games
Typically solving a 2-player poker game is defined as finding a Nash equilibrium strategy for both players. This is straightforward to define and has been the target of much poker research, but means finding a fixed strategy that doesn’t adapt to opponents and might not be the most profitable strategy in a field of a variety of opponents.
Small poker games can be solved through linear programming given a matrix of strategies at each information set and a matrix of payoffs (see here for more details).
As we prepare to solve larger games, we start to look at iterative algorithms.
The core feature of the iterative algorithms is self-play by traversing the game tree over all infosets and tracking the strategies and regrets at each.
Regret and EV
Regret is a measure of how much each strategy at an infoset is preferred and is used as a way to update strategies.
For a given P1 strategy and P2 strategy, a player has regret when they take an action at an infoset that was not the highest-EV action at that infoset.
Counterfactual Regret Minimization
The most popular method for iteratively solving poker games is the Counterfactual Regret Minimization (CFR) algorithm. A good resource on CFR is this 2015 paper from the University of Alberta.
The handout solver does not exactly use CFR. You can make updates to the solver however you would like, including modifying it to become CFR.
What is a counterfactual?
Actual event: I didn’t bring an umbrella, and I got wet in the rain
Counterfactual event: If I had brought an umbrella, I wouldn’t have gotten wet
CFR Algorithm Parts
Tabular storing strategies and regrets at each infoset
Regrets based on action values compared to node EV, which is based on counterfactual values
Regret minimization, usually regret matching, to get new strategies
Average strategy converges to Nash equilibrium
CFR+ variation such that regrets can’t be <0
Linear CFR such that regrets are weighted by their recency
Sampling methods
External: Sample chance and opponent nodes
Chance: Sample chance only
Outcome: Sample outcomes