#3 Rock Paper Scissors: Challenge

Challenge A: Rock Paper Scissors (RPS)

Usual RPS rules apply with Rock > Scissors > Paper > Rock with \(+1\) payouts for a win, \(-1\) payouts for a loss, and \(0\) for a tie.

Repeated Rock Paper Scissors was proposed in 2023 to be a “benchmark for multiagent learning” in a DeepMind paper.

The paper explains that agents are often measured by (a) average return or (b) robustness against a nemesis agent that tries to minimize the agent’s returns. Yet it’s important for agents to be able to maximize returns and be robust to adversaries.

Why is repeated RPS a good benchmark?

  1. It’s a repeated game with sequential decisions

  2. Performance is measured against a population of varied skills

The Competition

You will enter a Rock Paper Scissors bot and the field will be 1/2 student bot submissions and 1/2 our bots that will include:

  • Fixed-percentage bots
  • Not-very-sophisticated bots that act based only on the most recent observation
  • Some number of more advanced bots

You will play each other bot once for a 2,000 game match (once as P1 and once as P2 even though these don’t matter in this game).

Challenge B: Paper, Scissors, Maybe Rock (PSMR)

The gameplay works the same as RPS with the following additional rule.

Each matchup begins by the server generating two probabilities:

  • \(X\) is the probability that \(P1\) is not allowed to play Rock
  • \(Y\) is the probability that \(P2\) is not allowed to play Scissors

You will not be told your own or your opponent’s percentage.

The Competition

You will play multiple 1,000 game duplicate matches against each opponent (i.e., 2000 total games per match).

You will submit a PSMR bot and the field will include at least:

  • a bot that tries to play \(1/3\) - \(1/3\) - \(1/3\) or as close as it can
  • a bot that plays what would be Nash for it if the opponent were unconstrained
  • a bot that is our best attempt to do a reasonable thing, limited by the amount of time we actually decide to spend on it

We aren’t intending to say much more about how the \(X\) and \(Y\) probabilities will be generated, and no your bot should not be communicating with itself between matchups/duplicate matches or phoning home to you. (It should be storing info for itself game-to-game within a matchup, however.)

In the current handout version of challenge-3-psmr in aipc-challenges, you can practice by passing --duplicate <(scripts/psmr_deals.sh PROB_X PROB_Y) to engine.py. The script psmr_deals.sh will generate output that forces a particular sequence of deals, based on the two probabilities. If you want a replicable experiment, you can put a finite number of deals into a file with psmr_deals PROB_X PROB_Y | head -n NUM_LINE > deals.txt and pass --duplicate deals.txt instead.