AlgoGT

AlgoGT 2011

Workshop on Algorithmic Game Theory: Dynamics and Convergence in Distributed Systems

June, 20-21th 2011, Grenoble, France

10:00 - 10:15: Opening (C. Touati)

10:15 - 12:15 : Algorithms for Communication Networks
Session chair: Emmanuel Hyon (University Paris Ouest Nanterre)

Giovanni Neglia, Distributed Sub-Gradient Methods for Delay Tolerant Networks [abstract] [slides]
Stefano Iellamo, Imitation in a CSMA/CA based cognitive network [abstract] [slides]
Samir Perlaza, Satisfaction Equilibrium: Definition, Learning Dynamics and Applications [abstract] [slides]

14:00 - 16:00 : Computability in Distributed Systems
Session chair: Bruno Gaujal (INRIA)

Olivier Bournez, Computing with Large Populations [abstract] [slides]
Olivier Teytaud, Partially observable 2-player games [abstract] [slides]
Loig Jezequel, Distributed optimal planning [abstract] [slides]

16:30 - 17h50 : Evolutionary Dynamics
Session chair: Olivier Bournez (Ecole Polytechnique)

Yannick Viossat, Evolutionary Game Dynamics and convergence to equilibria : a survey [abstract] [slides]
Panayotis Mertikopoulos, Learning Dynamics for Nonlinear Games and their Applications to Networks [abstract] [slides]

8:30 - 10:30 : Stochastic Approximations
Session chair: Panayotis Mertikopoulos (Ecole Polytechnique)

David Leslie, Convergent learning algorithms for games with unknown noisy rewards [abstract] [slides]
Mathieu Faure, Consistency of Vanishing Smooth Fictitious Play [abstract] [slides]
Mario Bravo, An adjusted payoff-based procedure [abstract] [slides]

10:50 - 12:10 : Mean Field
Session chair: David Leslie (University of Bristol)

Hamidou Tembine, Mean Field Stochastic Games [abstract] [slides]
Nicolas Gast, Stochastic approximation, differential inclusions and stability [abstract] [slides]

13:50 - 15:10 : Revenue Optimization
Session chair: Yannick Viossat (Université Paris-Dauphine)

Onno Zoeter, Optimal and Robust Price Experimentation: Learning by Lottery [abstract] [slides]
Loubna Echabbi, Coalition stability under Shapley value revenue sharing mechanism : possible applications in telecommunication area [abstract]

15:10 - 17:00 : Open Discussions

Olivier Bournez, Ecole Polytechnique

Computing with Large Populations. We define a general model capturing the behavior of a population of anonymous agents that interact in pairs. This model captures some of the main features of opportunistic networks, in which nodes (such as the ones of a mobile ad hoc networks) meet sporadically. For its reminiscence to Population Protocol, we call our model Large-Population Protocol, or LPP. We are interested in the design of LPPs enforcing, for every ν ∈ [0,1], a fraction ν of the agents to be in a specific subset of marked states, when the size of the population grows to infinity, in which case, we say that the protocol computes ν. We characterize the numbers that are computable by LPPs. Precisely, we prove that, for every ν ∈ [0,1], ν is computable by an LPP if and only if ν is algebraic. The fact that transcendent numbers cannot be computed by LPPs is a consequence of Tarski's effective procedure for quantifier elimination over real closed fields. Our positive result is constructive. That it, for every algebraic number ν ∈ [0,1], we show how to construct a protocol which computes ν.

Mario Bravo, Université Pierre et Marie Curie

An adjusted payoff-based procedure. We study a simple adaptive model in the framework of an N-player normal form game. The model consists of a repeated game where the players only know their own strategy space and their own payoff received at each stage. The information about the other agents (actions and payoffs) is unknown. In particular, we consider a case where each player is allowed to use the number of times she has played each action to update her strategy. The resultant stochastic process is analyzed via the so-called ODE method from stochastic approximation theory. We are interested in the convergence of the process to rest points of a related continuous dynamics. Results concerning almost sure convergence and convergence with positive probability are obtained and we apply these results to a traffic game. Also, we provide some examples where convergence occurs with probability zero. Finally, we verify that part of the analysis holds when players are facing a random environment.

Loubna Echabbi, Institut National des Postes et Télécommunications

Coalition stability under Shapley value revenue sharing mechanism : possible applications in telecommunication area In this talk we consider situations where different service providers compete for a shared market basis where regulation is not an issue. Services as grids, games and other contents are examples. The market basis may be completely shared or structured geographically in a way that influences potential relationships with providers. We are interested in situations where coalitions may arise and investigate conditions where the choice of the revenue sharing mechanism leads to the stability of the grand coalitions. We focus on shapley values as such mechanism and present specific structure of games with such helpful stability properties.

Mathieu Faure, Institut de Mathématiques de Neuchâtel

Consistency of Vanishing Smooth Fictitious Play We discuss consistency of Vanishing Smooth Fictitious Play, a strategy in the context of game theory, which can be regarded as a smooth fictitious play procedure, where the smoothing parameter is time-dependent and asymptotically vanishes. We rely on stochastic approximation technics. This answers a question initially raised by Drew Fudenberg and Satoru Takahashi.

Nicolas Gast, Ecole Polytechnique Fédérale de Lausanne

Stochastic approximation, differential inclusions and stability Approximations of stochastic processes by ordinary deterministic differential equations (ODE) allows one to reduce the study of complex stochastic systems. This technique is broadly used to study the performance of large systems (via mean field models) or the stability of queuing networks (via fluid limits).
However, many system exhibits non-continuous dynamics that causes problem for defining correctly the limiting ODE. In this talk, I will present how to handle this problem of discontinuity using differential inclusions. I will show how this can be applied to study the stability of scheduling heuristics used in wireless networks.

Stefano Iellamo, Telecom ParisTech

Imitation in a CSMA/CA based cognitive network (joint work with Lin Chen and Marceau Coupechoux) In this presentation, we tackle the problem of opportunistic spectrum access in cognitive radio networks where a number of unlicensed Secondary Users (SU) operating on the standard CSMA/CA protocol access a number of frequency channels partially occupied by licensed Primary Users (PU). We apply evolutionary game theory to model the spectrum access problem and derive distributed mechanisms to converge to the Nash equilibrium. To this end, we combine a payoff computation methodology, relying on the estimation on the number of SUs on the same channel, with the channel access policy derived by the evolutionary game model. The conducted numerical analysis shows that a fast convergence is achieved and the proposed mechanisms are robust against errors in payoff computation.

Loig Jezequel, ENS Cachan Bretagne

Distributed optimal planning. We consider a multi-agent system where each agent is a discrete event system or an automaton: it is assigned an initial state, a goal state, and a set of possible actions changing its current state. Classical planning consists in efficiently organizing actions to reach the goal. In distributed optimal planning, the agents have to collaborate on some actions, so they must jointly reach their goals. In addition, actions have a cost for each agent, and the objective is to design a global plan (i.e a tuple of compatible local plans, one for each agent) with minimal cost (that is such that the sum of all local plan costs is minimal).
In this presentation, we provide a modeling of this problem in the setting of networks of weighted automata. We describe a distributed procedure to solve the problem, where agents simply exchange messages on their local plans and their respective costs. Messages and their processing are derived from weighted automata calculus. It is shown that the procedure convergences to an optimal plan, regardless of the chaotic communication scheme. Organized communications can only affect the convergence speed. This distributed optimization procedure can be seen as a degenerate form of cooperative game. Extensions to a true game setting are natural, for example by assuming that each agent aims at minimizing its own cost function (instead of the global one), while still being forced to collaborate with others to guarantee that everyone reaches the goal.

David Leslie, University of Bristol

Convergent learning algorithms for games with unknown noisy rewards (joint work with Archie Chapman, Alex Rogers and Nick Jennings) We address the problem of convergence to Nash equilibria in games with rewards that are initially unknown and which must be estimated over time from noisy observations. These games arise in many real-world applications, whenever rewards for actions cannot be prespecified and must be learned on-line. Using results from stochastic approximation and differential inclusions, we prove the convergence of variants of fictitious play and adaptive play to Nash equilibria in potential games and weakly acyclic games, respectively. These variants all use a multi-agent version of Q-learning to estimate the reward functions and a novel form of the ε-greedy decision rule to select an action. Furthermore, we derive ε-greedy decision rules that exploit the sparse interaction structure encoded in two compact graphical representations of games, known as graphical and hypergraphical normal form, to improve the convergence rate of the learning algorithms. The structure captured in these representations naturally occurs in many distributed optimisation and control applications. Finally, we demonstrate the efficacy of the algorithms in a simulated ad hoc wireless sensor network management problem.

Panayotis Mertikopoulos, Ecole Polytechnique

Learning Dynamics for Nonlinear Games and their Applications to Networks. The recent deployment of massively large communication networks which cannot be centrally administered has brought to the forefront the need for distributed learning algorithms that allow the users of a network to reach an efficient, stable state by themselves. Game theory provides a very attractive array of learning dynamics of this sort, but these dynamics are intimately tied to the linear structure of finite games, and, unfortunately, network problems do not always adhere to such linear structures.
We will explore some ways to bridge this gap between theory and practice by proposing a learning method for nonlinear games played over general spaces and with nonlinear payoff functions, and we will illustrate the method's effectiveness in concrete network problems. We will begin by examining the problem of efficient routing in wireline networks with nonlinear latencies: there, by employing a suitably modified variant of the replicator dynamics of evolutionary game theory, users converge to a state which minimizes their aggregate delay exponentially quickly. We then extend these convergence results to power allocation scenarios in multi-antenna (MIMO) wireless networks, where, despite the fact that the users' strategies are drawn from nonlinear cones of positive-definite matrices, users still manage to converge to an efficient equilibrium (even in the ergodic fast-fading regime).

Giovanni Neglia, INRIA

Distributed Sub-Gradient Methods for Delay Tolerant Networks In this paper we apply distributed sub-gradient methods to optimize global performance in Delay Tolerant Networks (DTNs). These methods rely on simple local node operations and consensus algorithms to average neighbours' information. Existing results for convergence to optimal solutions can only be applied to DTNs in the case of synchronous operation of the nodes and memory-less random meeting processes. In this paper we address both these issues. First, we prove convergence to the optimal solution for a more general class of mobility models. Second, we show that, under asynchronous operations, a direct application of the original sub-gradient method would lead to suboptimal solutions and we propose some adjustments to solve this problem. Further, at the end of the paper, we illustrate a possible DTN application to demonstrate the validity of this optimization approach.

Samir Perlaza, Alcatel Lucent Chair in Flexible Radio at Supelec

Satisfaction Equilibrium: Definition, Learning Dynamics and Applications (joint work with Hamidou Tembine, Samson Lasaulce, Mérouane Debbah) In this presentation, we introduce the notion of satisfaction equilibrium (SE), learning dynamics that converge to SE and some applications in wireless communications. In contrast to existing equilibrium notions, for instance Nash equilibrium (NE) and generalized NE (GNE), the idea of performance optimization in the sense of utility maximization or cost minimization does not exist. The concept of SE relies on the fact that players might be either satisfied or unsatisfied with their achieved performance. At the SE, if it exists, all players are satisfied. This notion of equilibrium perfectly models the problem of QoS provisioning in decentralized self-configuring networks. Here, radio devices are satisfied if they are able to provide the requested QoS. Within this framework, the concept of SE is formalized for both pure and mixed strategies. In both cases, sufficient conditions for the existence and uniqueness of the SE are presented. Learning methods that allows players to achieve a SE in pure strategies in finite time are discussed. Finally, a power control game in the interference channel is used to highlight the advantages of modeling QoS problems following the notion of SE rather than other equilibrium concepts.

Hamidou Tembine, Supelec

Mean Field Stochastic Games. We consider a class of stochastic games with resource states, types and individual internal states. At each stage, a random set of players interact. The states and the actions of all the interacting players determine together the instantaneous payoffs and the transitions to the next states. The talk will discuss the mean field equilibrium in stochastic games in a large-population regime, with connections to the propagation of chaos. We examine the (non-)convergence of the stochastic game with a variable set of interacting players when the total number of possible players grows without bound. Under specific structure of transition kernels (that the asymptotic indistinguishability per type), mean-field limit arises because of the large number of players. We show that the optimal payoffs and the mean field equilibrium payoffs are solutions of a coupled system of backward-forward equations. The limiting games are closely related to discrete time anonymous sequential population games or to differential population games. Using multidimensional diffusion processes, a general mean field convergence to a coupled stochastic differential equation is given. We illustrate both stable and unstable behaviors of the controlled mean field limit in wireless networks. Finally, we analyze the behavior of the system in presence of risk-sensitive players. Risk-sensitivity is captured through exponential of long-term payoff for each player. We derive a risk-sensitive version of Bellman-Shapley equation via multiplicative dynamic programming and a Kolmogorov forward equation. As a special case results, irreducible states will be presented.

Olivier Teytaud, INRIA

Partially observable 2-player games (joint work with D. Auger and F. Teytaud) It is commonly said that partially observable games are decidable in the two-player case and undecidable in team play (2 vs 1). We will show that this depends on the exact decidability question, and that the 2-player case is indeed undecidable for the most natural question. We will then discuss adaptations of the MCTS algorithm for restricted (decidable) versions of partially observable games using bandit algorithms (e.g. EXP3).

Yannick Viossat, Université Paris-Dauphine

Evolutionary Game Dynamics and convergence to equilibria : a survey. We survey results on convergence to Nash or correlated equilibria of three kinds of game dynamics : classical evolutionary dynamics (e.g. replicator or best-reply dynamics) ; no-regret dynamics ; learning à la Peyton Young.

Onno Zoeter, Xerox Research Center Europe

Optimal and Robust Price Experimentation: Learning by Lottery (joint work with Christopher R. Dance) This talk studies optimal price learning for one or more items. We introduce the Schrödinger price experiment (SPE) which superimposes classical price experiments using lotteries, and thereby extracts more information from each customer interaction. If buyers are perfectly rational we show that there exist SPEs that in the limit of infinite superposition learn optimally and exploit optimally. We refer to the new resulting mechanism as the hopeful mechanism (HM) since although it is incentive compatible, buyers can deviate with extreme consequences for the seller at very little cost to themselves. For real-world settings we propose a robust version of the approach which takes the form of a Markov decision process where the actions are functions. We provide approximate policies motivated by the best of sampled set (BOSS) algorithm coupled with approximate Bayesian inference. Numerical studies show that the proposed method significantly increases seller revenue compared to classical price experimentation, even for the single-item case.