this dir | view | cards | source | edit | dark
top
Algorithmic Game Theory
- game theory
- we want to mathematically model the interactions between rational agents
- normal-form game … (P,A,u)
- P … finite set of n players
- A=A1×⋯×An … set of action profiles
- A … set of all possible states of the game
- Ai … set of actions available to player i
- u=(u1,…,un)
- ui:A→R … utility function for player i
- assigns happiness of each player to every possible state of the game
- sometimes is used cost function instead, ci=−ui
- knowing the utility function, every player i choose an action ai from Ai (all players at the same time)
- strategies
- prescription how player i chooses his actions from Ai
- pure strategy si of player i is an action from Ai
- "select a single action and play it"
- pure-strategy profile … (s1,…,sn) where si∈Ai for each player i
- mixed strategy si of player i is a probability distribution over Ai
- ∀ai∈Ai:si(ai)∈[0,1] so that ∑ai∈Aisi(ai)=1
- si(ai) … probability that i chooses ai as their action
- Si … set of all mixed strategies of player i
- mixed-strategy profile … (s1,…,sn) where si∈Si for each player i
- every pure strategy is a special case of a mixed strategy
- expected payoff
- in G=(P,A,u), the expected payoff for player i of the mixed-strategy profile s=(s1,…,sn) is ui(s)=a=(a1,…,an)∈A∑ui(a)⋅j=1∏nsj(aj)
- linearity of the expected payoff
- ui(s)=∑ai∈Aisi(ai)⋅ui(ai;s−i)
- notation
- s−i=(s1,…,si−1,si+1,…,sn)
- (si′;s−i)=(s1,…,si−1,si′,si+1,…,sn)
- normal form games – examples
- games for two players
- bimatrix games – we can use two matrices to express utilities
- prisoner's dilemma – testify × remain silent
- paradoxically, the only stable solution is when both testify
- matching pennies
- each player chooses heads or tails
- if pennies match, player 1 wins, otherwise player 2 wins
- zero-sum game (as well as rock-paper-scissors)
- battle of sexes (manželský spor)
- (husband, wife)
- football × opera
- if they disagree → (0,0)
- they attend the events separately
- if they agree
- (football, football) → (2,1)
- (opera, opera) → (1,2)
- there's not a best outcome/solution
- game of chicken (hra na zbabělce)
- two drives drive towards each other on a collision course
- turn × go straight
- (turn, turn) → (0,0)
- (turn, go straight) → (1,-1)
- (go straight, turn) → (1,-1)
- (go straight, go straight) → (-10,-10)
- best response of player i to a strategy profile s−i
- mixed strategy si∗ such that ui(si∗;s−i)≥ui(si′;s−i) for each si′∈Si
- if player i knew the strategies of other players, the player would choose this one
- it maximizes his expected payoff if others play s−i
- Nash equilibrium
- a solution concept
- for a normal-form game G=(P,A,u) of n players, a Nash equilibrium (NE) in G is a strategy profile (…)
- remarks
- neither best responses nor Nash equilibria are determined uniquely
- rock-paper-scissors – unique mixed Nash equilibrium (both players play everything with probability 1/3)
- battle of sexes – three Nash equilibria, two pure and one mixed
- in every game, there is a Nash equilibrium
- preparations for the proof of Nash's theorem
- the proof is essentially topological
- we need Brouwer fixed-point theorem
- df: for d∈N, a subset X of Rd is compact ≡ X is closed and bounded
- df: Y⊆Rd is convex ≡ every line segment containing two points from Y is fully contained in Y
- formally: (∀x,y∈Y)(∀t∈[0,1]):tx+(1−t)y∈Y
- df: for n affinely independent point s x1,…,xn∈Rd, an (n−1)-simplex Δn on x1,…,xn is the set of convex combinations of the points x1,…,xn
- each simplex is a compact convex set in Rd
- lemma
- (…)
- “you cannot mess up compactness by using cartesian product”
- Brouwer's Fixed Point Theorem
- ∀d∈N, let K be a non-empty compact convex set in Rd and f:K→K be a continuous mapping
- then, there exists a fixed point x0∈K for f, that is, f(x0)=x0
- https://www.youtube.com/watch?v=csInNn6pfT4&t=268s
- proof of Nash's theorem
- let G=(P,A,u) be a normal-form game of n players
- Si … set of mixed strategies of player i
- we want to apply Brouwser's theorem, thus we need to find a suitable compact convex body K and a continuous mapping f:K→K whose fixed points are NE in G
- we start with K, let K=S1×⋯×Sn be the set of all mixed strategies
- we verify that K is compact and convex
- by definition, each Si is a simplex which is compact and convex
- by lemma, the set K is compact
- for any strategy profiles s∈K and s′∈K and a number t∈[0,1]
- …
- …
- Nash's theorem requires finite numbers of players and actions
- consider 2-player game “who guesses larger number wins”
- the proof is non-constructive
- how to find NE efficiently?
- proofs of Brouwer's fixed point theorem are non-constructive as well
- Pareto optimality
- another solution concept
- we want to capture “the best” state of a game, it might be difficult (as in battle of sexes)
- a strategy profile s in G Pareto dominates s′, written s′≺s if, for every player i, ui(s)≥ui(s′) and there exists a player j such that uj(s)>uj(s′)
- a strategy profile s∈S is Pareto optimal if there does not exist another strategy profile s′∈S that Pareto dominates s
- in zero-sum games, all strategy profiles are Pareto-optimal
- not all NE are Pareto-optimal (the NE in Prisoner's dilemma)
- Vilfredo Pareto
- Pareto principle: for many outcomes, roughly 80% of consequences come from 20% of the causes
- finding Nash equilibria
- let's start with two-player zero-sum games
- rock paper scissors, chess, table tennis
- let G=(P,A=A1×A2,u) be a zero-sum game
- that is u1(a)+u2(a)=0 for every a∈A
- we can use one payoff matrix M where Mij=u1(i,j)=−u2(i,j)
- for a strategy profile (s1,s2), we write xi=s1(i) and yj=s2(j) representing (s1,s2) with mixed strategy vectors x=(x1,…,xm) and y=(y1,…,yn) that satisfy ∑xi=1 and ∑yj=1
- the expected payoff of player 1 then equals u1(s)=∑a=(i,j)∈Au1(a)s1(i)s2(j)=∑i∑jMijxiyj=xTMy=−u2(s)
- player's 2 best response to a strategy x of 1 is a vector y∈S2 that minimizes xTMy
- β(x)=minyxTMy … best expected payoff of 2
- player's 1 best response to a strategy y of 2 is a vector x∈S1 that maximizes xTMy
- α(y)=maxxxTMy
- …
- the minimax theorem
- for every zero-sum game, worst-case optimal strategies for both players exist and can be efficiently computed
- there is a number v such that, for any worst-case optimal strategies x∗ and y∗, the strategy profile (x∗,y∗) is a Nash equilibrium and β(x∗)=(x∗)TMy∗=α(y∗)=v
- remarks
- it was proved by John von Neuman
- it tells us everything about zero-sum games
- there are no secrets in zero-sum games – each player can always choose the worst-case optimal strategy
- original proof uses Brouwer's theorem, we will use linear programming
- proof of the minimax theorem
- geometry preliminaries
- hyperplane
- halfspace
- polyhedron – intersection of finitely many halfspaces
- polytope – bounded polyhedron
- d-dimensional polytope is simple … all its vertices are adjacent to exactly d edges
- linear programming preliminaries
- constraints, objective function
- Simplex method, Ellipsoid method
- duality
- the proof
- we'll use the duality
- first, we find the best response, actually the worst-case best response
- …
- bimatrix games
- example: Prisoner's dilemma, collaborative projects
- brute force algorithm (to find NE)
- best response polyhedra
- analysis of the NASH problem (finding nash equilibria in bimatrix games)
- as a decision problem it is trivial – there always exists a Nash equilibrium
- FNP complexity class (“functional NP”)
- clearly, NASH belongs to FNP
- checking whether a strategy profile is NE can be done using the best response condition
- theorem: if NASH is FNP-complete, then NP = coNP
- proof is in the lecutre notes (it is not required)
- therefore, it is unlikely that NASH is FNP-complete
- new complexity class
- END-OF-THE-LINE problem
- the graph can be exponentially large with respect to the input size
- class PPAD
- other solution concepts
- ε-NE
- correlated equilibrium
- linear program
- any feasible solution will be a correlated equilibrium
- we can construct the objective function to maximize the general wellbeing
- regret minimization of an agent in a hostile environment
- we have an agent A in an adversary environment
- there are N available actions for A in the set X={1,…,N}
- at each step t=1,…,T
- our agent selects a probability distribution corresponding to the probability of choosing each action a∈A
- adversary chooses a loss vector, loss ∈[−1,1]
- to be able to tell how well is our agent doing, we compare his loss to the loss of the best agent for some comparison class A
- we will mostly consider the class AX={Ai:i∈X}, where an agent Ai always chooses action i
- external regret … RAT=LAT−min{LBT:B∈AX}=LAT−min{LiT:i∈X}
- for now, we will consider loss ∈{0,1}
- it might seem that the class AX contains too simple agents
- however, we show that large comparison classes lead to a very large regret
- let Aall be the set of agents that assign probability 1 to an arbitrary action from X in every step
- …
- algorithms
- greedy
- randomized greedy
- polynomial weights
- no-regret dynamics
- modern proof of the minimax theorem
- coarse correlated equilibrium
- CE is always CCE
- but CCE does not imply CE
- hierarchy of Nash equilibria
- converging to CCE
- regrets
- external regret
- we compare to agents playing only one action
- internal regret
- we compare to agents that play the same actions as my agent (with same probabilities), only replacing one action by some other one
- swap regret
- the same as internal but we allow to replace multiple actions (while keeping the probabilities)
- we have polynomial weights algorithm for external regret
- how to construct algorithm for swap regret?
- we will use black-box reduction
- games in extensive form
- chess … extensive form = tree of the game
- in perfect-information game, all players know the node they are in
- in imperfect-information game, they don't
- we partition decision nodes into information sets where all nodes belong to the same player and have the same moves
- we can convert any normal-form game of two players to an imperfect-information game in extensive form
- pure, mixed and behavioral strategies
- mixed strategy = probability distribution over vectors
- behavioral strategy = vector of probability distributions
- in general, the expressive power of behavioral strategies and mixed strategies are incomparable – but there is a class of extensive games for which the two definitions coincide
- finding NE in extensive games
- sequence form
Mechanism design
- single item auctions
- seller
- n bidders
- each has valuation vi
- each privately communicates a bid bi to the seller
- seller decides
- who receives the item (if anyone)
- the selling price p
- bidders' utility
- ui=0 … if the bidder lost the auction
- ui=vi−p … if the bidder won
- we will eventually assume that p≤vi
- design
- selling the item for free to the bidder with the highest bid
- not a good idea
- “who can name the highest number”
- selling the item to the bidder with the highest bid for the price equal to the bid
- does not work that well
- what should be the bid? the bidder does not know what to do
- dominant strategy for bidder i … strategy that maximizes ui no matter what the other bidders do
- social surplus … ∑ivixi
- xi is the indicator of the bidder winning the auction
- to maximize the social surplus for single item auction, we need the bidder with the highest valuation to win
- an auction is awesome if it satisifies…
- strong incentive guarantees
- strong performance guarantees
- computational efficiency
- there exists an awesome auction!
- Vickrey's auction
- the winner pays the second highest bid
- single parameter environments
- vi … valuation “per unit of the goods”
- Myerson's lemma
- maximizing expected revenue
- virtual valuation
- φi(x)=x−fi(x)1−F
- where fi(x)1−F can be regarded as the loss of not knowing the exact valuation
- Bulow-Klemperer theorem
- back to social surplus: knapsack auctions
- n bidders
- each bidder has
- public size wi≥0
- private valuation vi≥0
- seller has a capacity W≥0
- feasible set X … 0/1-vectors such that∑ixiwi≤W
- we cannot be awesome :(
- multi-parameter mechanism design
- n bidders
- Ω … finite set of outcomes
- ∀ω∈Ω: each bidder has a private valuation vi(ω)≥0
- ∀ω∈Ω: each bidder submits his bids bi(ω)≥0
- our goal is to design a mechanism that selects an outcome ω∈Ω so that it maximizes the social surplus ∑ivi(ω)
- bidder A winning and bidder B winning are two separate outcomes → bidders can have opinions about each other bidder winning the item as well
- example: single-item auction
- Ω={ω1,…,ωn,ω∅}
- VCG mechanism
- states that we can do DSIC social surplus maximization
- note: allocation rule can assume that the bidders bid truthfully if we consider DSIC – but the payment formula has to provide DSIC
- proof of Myerson's lemma
- payment difference sandwich
- y,z are not functions but only coefficients (bids)
- xi(y;b−i)−xi(z;b−i) has to be non-positive
- because z⋅()≤y⋅() and y<z
- revelation principle
- is DSIC too restrictive?
- if there is a dominant strategy, we can assume that it is truthtelling