#2: Leduc Poker | Class Materials

Solving Poker Games

Typically solving a 2-player poker game is defined as finding a Nash equilibrium strategy for both players. This is straightforward to define and has been the target of much poker research, but means finding a fixed strategy that doesn’t adapt to opponents and might not be the most profitable strategy in a field of a variety of opponents.

Small poker games can be solved through linear programming given a matrix of strategies at each information set and a matrix of payoffs (see here for more details).

As we prepare to solve larger games, we start to look at iterative algorithms.

The core feature of the iterative algorithms is self-play by traversing the game tree over all infosets and tracking the strategies and regrets at each.

Regret and EV

Regret is a measure of how much each strategy at an infoset is preferred and is used as a way to update strategies.

For a given P1 strategy and P2 strategy, a player has regret when they take an action at an infoset that was not the highest-EV action at that infoset.

Regret Exercise

What is the regret for each action?

Action Regret
A
B
C
Action Regret
A 4
B 2
C 0
Expected Value Exercise

If taking a uniform strategy at this node (i.e. \(\frac{1}{3}\) for each action), then what is the expected value of the node?

\(\mathbb{E} = \frac{1}{3}*1 + \frac{1}{3}*3+\frac{1}{3}*5 = 0.33+1+1.67 = 3\)

Poker Regret Exercise

In poker games, the regret for each action is defined as the value for that action minus the expected value of the node. Give the regret values for each action under this definition.

Action Value Poker Regret
A 1 -2
B 3 0
C 5 2

At a node in a poker game, the player prefers actions with higher regrets by this definition.

Counterfactual Regret Minimization

The most popular method for iteratively solving poker games is the Counterfactual Regret Minimization (CFR) algorithm. A good resource on CFR is this 2015 paper from the University of Alberta.

The handout solver does not exactly use CFR. You can make updates to the solver however you would like, including modifying it to become CFR.

What is a counterfactual?

Actual event: I didn’t bring an umbrella, and I got wet in the rain

Counterfactual event: If I had brought an umbrella, I wouldn’t have gotten wet

CFR Algorithm Parts

  • Tabular storing strategies and regrets at each infoset

  • Regrets based on action values compared to node EV, which is based on counterfactual values

  • Regret minimization, usually regret matching, to get new strategies

  • Average strategy converges to Nash equilibrium

  • CFR+ variation such that regrets can’t be <0

  • Linear CFR such that regrets are weighted by their recency

  • Sampling methods

    • External: Sample chance and opponent nodes

    • Chance: Sample chance only

    • Outcome: Sample outcomes