Facebook develops unbeatable poker-playing Artificial Intelligence called Pluribus







Main Menu
News









Current

News

Welcome to the News desk.

Facebook develops unbeatable poker-playing Artificial Intelligence
called Pluribus

17/07/19

Editor

• Facebook’s artificial intelligence team has made what it describes as a “superhuman” poker champion, a bot with the ability to beat world-leading human pros

Facebook is shouting about its AI bot, named Pluribus, as a major breakthrough: the first capable of beating as many as six players and in a game that involves what is known as “hidden” information - the other players hole cards.

In recent years there have been great strides in artificial intelligence (AI), with games often serving as challenge problems, benchmarks, and milestones for progress. Poker has served for decades as such a challenge problem. Past successes in such benchmarks, including poker, have been limited to two-player games. However, poker in particular is traditionally played with more than two players. Multiplayer games present fundamental additional issues beyond those in two-player games, and multiplayer poker is a recognized AI milestone

The past two decades have witnessed rapid progress in the ability of AI systems to play increasingly complex forms of poker but all prior breakthroughs have been limited to settings involving only two players. Here however we have Pluribus, an AI capable of defeating elite human professionals in six-player no-limit Texas hold’em poker, the most commonly played poker format in the world.

The strength of Pluribus’s strategy was computed via self play, in which the it played against copies of itself, without any data of human or prior AI play used as input. Pluribus started from scratch by playing randomly, and gradually improved as it determined which actions, and which probability distribution over those actions, lead to better outcomes against earlier versions of its strategy.

One of the remarkable things about creating the blueprint strategy for Pluribus was that it was calculated in a mere 8 days on a 64-core server for a total of 12,400 CPU core hours. It required less than 512 GB of memory. At standard cloud computing rates that would cost about $100 to produce. This is in sharp contrast to all the other recent superhuman AI milestones for games, which used large numbers of servers and/or farms of GPUs or supercomputers at great expense For nistance efforts from from Google’s AI research shop Deepmind, have relied on supercomputers consisting of more than 5,000 specialist processors, at a reported cost of millions of dollars.

Reducing the computing power necessary for AI experiments is seen as a key hurdle to the technology’s development, with the computing power needed currently exceeding the rate at which processors are getting more efficient.

The Facebook team’s research also makes humbling reading for any poker players proud of their ability to spot a “tell”.

“We think of bluffing as this very human trait,” explained Noam Brown, the lead researcher from Facebook’s AI team, speaking to BBC News.

“But what we see is that bluffing is actually mathematical behaviour. When the bot ‘bluffs’, it doesn’t view it as deceptive or dishonest, it’s just the way to make the most money.”

Mr Brown went on to say that neither he nor Facebook had any plans to use the AI in real poker games. Indeed, the firm has said it is not publicly disclosing much of the code for fear of having a negative impact on the poker community. It would, a spokesman said, provide examples of techniques to other researchers working on AI.

Beyond poker, Mr Brown would not be drawn on what practical use Facebook might have in mind for the technology.

Pluribus played against elite human professionals in two formats: five human professionals playing with one copy of Pluribus (5H+1AI), and one human professional playing with five copies of Pluribus (1H+5AI). Performance was measured using the standard metric in this field of AI, milli big blinds per game (mbb/game). This measures how many big blinds (the initial money the second player must put into the pot) were won on average per thousand hands of poker.

The human participants in the 5H+1AI experiment were Jimmy Chou, Seth Davies, Michael Gagliano, Anthony Gregg, Dong Kim, Jason Les, Linus Loeliger, Daniel McAulay, Greg Merson, Nicholas Petrangelo, Sean Ruane, Trevor Savage, and Jacob Toole. In this experiment, 10,000 hands of poker were played over 12 days. Each day, five volunteers from the pool of professionals were selected to participate based on availability. The participants were not told who else was participating in the experiment. Instead, each participant was assigned an alias that remained constant throughout the experiment. The alias of each player in each game was known, so that players could track the tendencies of each player throughout the experiment. $50,000 was divided among the human participants based on their performance to incentivize them to play their best. Each player was guaranteed a minimum of $0.40 per hand for participating, but this could increase to as much as $1.60 per hand based on performance.

After applying a variance-reduction technique (AIVAT) Pluribus won an average of 48 mbb/game (with a standard error of 25 mbb/game). This is considered a very high win rate in six-player no-limit Texas hold’em poker, especially against a collection of elite professionals, and implies that Pluribus is stronger than the human opponents. A top pro would expect to win 50 mbb/game against average players (a game is a full lap of the button so that every posts big and small blinds).

The human participants in the 1H+5AI experiment were Chris “Jesus” Ferguson and Darren Elias. Each of the two humans separately played 5,000 hands of poker against five copies of Pluribus. Pluribus does not adapt its strategy to its opponents and does not know the identity of its opponents, so the copies of Pluribus could not intentionally collude against the human player. To incentivize strong play, we offered each human $2,000 for participation and an additional $2,000 if he performed better against the AI than the other human player did. The players did not know who the other participant was and were not told how the other human was performing during the experiment. For the 10,000 hands played, Pluribus beat the humans by an average of 32 mbb/game (with a standard error of 15 mbb/game).

Poker Lessons
Because Pluribus’s strategy was determined entirely from self-play without any human data, it also provides an outside perspective on what optimal play should look like in multiplayer no-limit Texas hold’em. Pluribus confirms the conventional human wisdom that limping (calling the “big blind” rather than folding or raising) is suboptimal for any player except the “small blind” player who already has half the big blind in the pot by the rules, and thus has to invest only half as much as the other players to call. While Pluribus initially experimented with limping when computing its blueprint strategy offline through self play, it gradually discarded this action from its strategy as self play continued. However, Pluribus disagrees with the folk wisdom that “donk betting” (starting a round by betting when one ended the previous betting round with a call) is a mistake; Pluribus does this far more often than professional humans do.

AI Conclusions
Forms of self play combined with forms of search has led to a number of high-profile successes in perfect-information two-player zero-sum games. However, most real-world strategic interactions involve hidden information and more than two players. This makes the problem very different and significantly more difficult both theoretically and practically. Developing a superhuman AI for multiplayer poker was a widely-recognized milestone in this area and the major remaining milestone in computer poker. In this paper we described Pluribus, an AI capable of defeating elite human professionals in six-player no-limit Texas hold’em poker, the most commonly played poker format in the world. Pluribus’s success shows that despite the lack of known strong theoretical guarantees on performance in multiplayer games, there are large-scale, complex multiplayer imperfect-information settings in which a carefully constructed self-play-with-search algorithm can produce superhuman strategies.