In this article, I study a model in which two companies use Q-learning algorithms to repeatedly
play a game from a class including first/second price auctions, Bertrand competition and pris-
oner’s dilemma. Previous papers have highlighted the tendency of these algorithms to behave
collusively without any instruction to do so (algorithmic collusion), depending on their param-
eterization. Unlike previous papers in the literature, I assume that the companies are free to
select the parameters for their algorithms strategically. Thus, in a first stage, the companies
simultaneously select parameters for their algorithms, then in a second stage the algorithms
repeatedly play the game (for example an automatized auction for advertisement spots) and
the designers collect the limiting payoffs. I show that, under the (mere) assumption that the
companies’ machines have limited capacities, any Nash equilibrium of the aforementioned
(two-stage) game features some algorithmic collusion (Theorem 1). In other terms, letting the
margin of competition move to the design of the algorithms is not sufficient to avoid algorith-
mic collusion. In the second part, I use extensive numerical simulations to study a restriction to
this model to a repeated prisoner’s dilemma played by ε-greedy algorithms. Their results reveal
the strategic role of exploration levels and how it hampers collusion in equilibrium.
Previous Article in event
Previous Article in session
Next Article in event
Algorithmic collusion under competitive design
Published:
14 October 2025
by MDPI
in The 1st International Electronic Conference on Games
session Algorithmic, Computational and Applied Game Theory
Abstract:
Keywords: Algorithmic collusion; Q-learning; Reinforcement Learning; Multi Agent Reinforcement Learning
