In a stunning victory completed tonight the Libratus Poker AI, created by Noam Brown et al. at Carnegie Mellon University, has beaten four human professional players at No-Limit Hold'em.
For the first time in history, the poker-playing world is facing a future of machines taking over the game of Texas Holdem.
Is poker, after Chess and Go, the next game that has been “solved” by computers? Are humans done for playing poker - at least in terms of beating an advanced AI?
Will Libratus’ colossal victory change the way we approach the game? Let's try and answer a few (or all) of those questions.
From Zero to Hero in 2 Years
Two years ago a team from Carnegie Mellon University developed a computer program with the goal to beat the best players in Heads-Up No-Limit-Hold’em, one of the more complex poker variants.
Back then the program struggled when facing four professional players and eventually lost against the human counterparts.
But the developers of the AI used the past two years to improve the program immensely - and their improvements were extraordinary.
A re-match was scheduled against four of the best heads-up poker players. 120,000 hands, a statistically significant sample, were played and the result was a total and utter annihilation of the human players by the AI.
Who Was Playing?
Dong Kim, Jason Les, Jimmy Chou and Daniel McAulay - four distinguished and well-versed poker players - represented the humans in this challenge.
Kim is a highly successful online high-stakes player; Les was twice in striking range of a WSOP bracelet in 2015 when he finished second and third in WSOP events; Chou won the Asia Championship of Poker one year ago and McAulay has won several hundred thousand dollars playing online tournaments.
Most importantly: All four players excel at Heads-Up No-Limit-Hold’em, the game that was played during the challenge.
The Libratus AI was developed by a team of researchers at Carnegie Mellon around PhD student Noam Brown and Tuomas Sandholm. It's a derivative of the Claudico AI which lost its challenge against the humans two years ago.
Special Rules to Reduce Luck
This challenge lasted for 120,000 hands – 30,000 per player - and ran from January 11-30. For each hand the player and AI started with 20,000 chips with the blinds at 50/100.
This ensured that every hand was played with a stack size of 200 big blinds -- reasonably deep stacks for heads-up poker which allowed plenty of room for strategic moves in each hand.
To reduce the luck factor, which might heavily skew the results, two special rules were put in place:
1. All hands were mirrored. For example: when Player A got aces vs. kings at one table against the AI, Player B received the kings while the AI had aces at the same time. Thus no party could just run hot over the course of the challenge.
2. No hard all-ins. When a hand was all-in before the river no more cards were dealt and each player received his equity in chips. If a player was ahead 70/30 on the turn and got all-in, he was credited with 70% of the pot and the opponent got 30%. This also reduced the luck factor.
After 20 days and 120,000 hands played, the result was shockingly unambiguous: Libratus beat each player and won at a rate of $14.72 per hand.
This equates to a win rate of 14.7 big blinds per 100 hands – an outstanding result for the AI. All four human players lost over their 30,000 hands against Libratus. This is how they performed individually:
|Sum / Average||-$1,766,250||-$14.72|
Maybe the AI Was Just Lucky?
While the rules of the challenge were set to reduce the luck factor as much as possible, chance still plays a big role in the results of each hand – even with mirrored hands and even with the elimination of all-in luck.
So maybe, just maybe, the human players are actually better but the AI just got lucky. Let's look at some statistics regarding the results.
The AI won with a win rate of 14.7 big blinds per 100 hands. 120,000 hands were played and the standard deviation was somewhere between 100 and 200 big blinds per 100 hands.
Those are just rough estimates for the variance, but as we'll see they're good enough boundaries. With those numbers we can run some calculations with a poker variance calculator and answer this question:
What's the probability of the humans actually playing better than the AI but losing at a rate of 14.7 big blinds per 100 hands over 120,000 hands?
It turns out this probability is very low: Somewhere between 0.0001% (in case of the lower boundary for the standard deviation) and 0.54% (in case of the upper boundary).
Meaning: It's very, very unlikely the general result of this challenge – the AI plays better than four humans – is due to the AI just getting lucky. No bad luck. The Libratus AI is simply the better player at Heads-Up No-Limit-Hold’em.
How Does Libratus Work?
Basically the Libratus AI is just a huge set of strategies which define how to play in a certain situation. Two examples of such strategies (not necessarily related to the actual game play of Libratus):
If the game state is preflop and the AI is first to act and holds 7♦ 4♥, then it will raise to 3 big blinds 50% of the time, raise to 5 big blinds 30% of the time and fold 20% of the time.
If the game state is on the turn and the AI faces a raise after having already faced a raise on the flop and holds an ace-high flush-draw on a low board, then it will call 40% of the time and move all-in 60% of the time.
It quickly becomes obvious that there are almost uncountably many different situations the AI can be in and for each and every situation the AI has a strategy.
It's worth noting that most situations come in mixed strategies like the two above – sometimes do this, sometimes do that. The AI effectively rolls a dice to decide what to do but the probabilities and actions are pre-calculated and well balanced.
Strategy c/o $10m Super Computer
To generate the strategies for all those situations the team around Brown and Sandholm used a supercomputer called Bridges.
It's roughly 30,000 times faster than an average modern desktop computer, runs on 274 terabytes of RAM and cost $9.65m.
The computer played for many days against itself, accumulating billions, probably trillions of hands and tried randomly all kinds of different strategies.
Whenever a strategy worked, the likelihood to play this strategy increased; whenever a strategy didn't work, the likelihood decreased. Basically, generating the strategies was a colossal trial and error run.
In an extensive AMA on Reddit Brown explained the learning process of Libratus like this:
“The basis for the bot is reinforcement learning using a special variant of Counterfactual Regret Minimization. Prior to this competition, it had only played poker against itself. It did not learn its strategy from human hand histories."
Libratus was well prepared for the challenge but the learning didn't stop there. Each day after the matches against its human counterparts it adjusted its strategies to exploit any weaknesses it found in the human strategies, increasing its leverage.
Complexity is Limited
How can a computer beat seemingly strong poker players? For most players poker is a game of reads, guts, deception and intuition.
A computer doesn't have a gut feeling. A computer doesn't have any intuition.
Unlike Chess or Go, poker is a game with incomplete information and lots of randomness involved. How can a computer excel at such a game?
First, one needs to understand that while poker is a very complex game – much more complex than Chess or even Go – its complexity is limited. There are only so many different ways the cards can be shuffled and only so many possible different distinguishable games to be played.
To put this in numbers: In Heads-Up Limit-Hold'em there are roughly 316,000,000,000,000,000 different game situations. If you played out one of them per second, you'd need 10 billion years to finish them all. That's a lot of game situations.
For No-Limit the number is some orders of magnitude higher since you can bet almost arbitrarily large amounts, but the matter of fact is that the total number of different game situations is finite.
No Guts; Just Perfect Strategy
For all games which only allow a finite number of game situations a Nash Equilibrium exists. A Nash Equilibrium is a strategy which ensures that the player who is using it will, at the very least, not fare worse than a player using any other strategy.
In layman's terms: Playing the Nash equilibrium strategy means you cannot lose against any other player in the long run. The existence of those equilibriums was proven by John Nash in 1950 and the proof earned him the Nobel Prize in Economics.
This Nash equilibrium means: Guts, reads and intuition don't matter in the end. There is perfect strategy for poker; we just have to find it.
All you need is a suitable computer which can handle quadrillions of different situations, works on millions of billions of terabyte of memory and is blazingly fast. Then you put a team of sharp, clever humans in front of it, let them develop a method to utilize the computational power and you're there.
Is Poker Solved? Are We Done?
Right now Libratus is just the beginning. The AI still simplifies many different poker situations.
For example it might not differentiate between a king-jack high flush-draw and a king-ten high flush-draw. It might not differentiate between betting 55% of the pot and betting 60% of the pot.
But Libratus is already close to having developed a perfect strategy – at least close enough to annihilate any human counterpart. With more time and even more computational power than the current $9.65 million super computer powering it, Libratus will only improve its performance
AIs that will best any player in non heads-up games will likely follow.
What About Other Variants?
Libratus beat humans in No-Limit Heads-Up. Two years ago the University of Alberta introduced Cepheus to the world -- a bot which, for all intents and purposes, plays a perfect Limit Heads-Up strategy.
It's safe to say that those two variants are practically solved. As a matter of fact the guys from the University of Alberta managed to prove that their bot is at worst 0.05 Big Bets per 100 hands away from a perfect (i.e. Nash equilibrium) strategy.
While The No-Limit bot Libratus might be much further away from this perfect strategy, it's only a matter of time before it'll be refined and get closer to it.
What about other poker variants? Poker with more than two players is orders of magnitudes more complex than heads-up. The same holds true for more difficult variants like Omaha.
But in the world of computers, where computational power still grows exponentially, “orders of magnitudes” often times only means: “give it a couple more years."
It's only a matter of time before bots will take over and be the true rulers in the poker world.
But a bot like Libratus is still so complex it requires a direct connection to its enormous super computer while playing. And it still plays remarkably slow. So there's no direct danger of it being used in your local casino or online game.
But iit won't be that long before comparable variants will be able to run on our smart devices.
Aren't We There Already?
The scary fact is: Bots don't even have to play a perfect strategy. And they don't have to beat the best players.
To make an impact they just have to beat the average player. And there's bad news on that front: We're there already.
For virtually any poker game there already is a bot that plays better than the average, decent human player. So while poker in general might not yet be solved in a theoretical sense, it's solved enough for a decent bot to beat a decent player.
The same phenomena was visible when computer chess was developed. Years before Deep Blue beat the reigning world champion Garry Kasparov in 1997 computers were already beating grand masters and masters.
In fact the first time a computer reached an ELO rating comparable to a master rank was in 1981 -- 16 years before the AI eventually beat the world champion.
In poker we're probably midway between those two points right now.
Is This the End of Poker?
With computers challenging the best poker players, one question looms ahead: Are we facing the end of poker? The answer is twofold as one has to distinguish between live and online poker.
It also has to be noted that the problem the poker industry is facing is not new at all. The Libratus victory is not the first time bots demonstrated their ability to beat decent human players.
More than five years ago the Bellagio casino in Las Vegas installed a $2/$4 Limit Hold'em bot that everyone could play against. The bot didn't take any rake; it simply made money by beating the players.
So it was already more than five years ago we were facing bots that could beat average players.
In online poker decent bots have been around at least eight years now and all reputable sites disallow the usage of the. Any players caught using them have their winnings confiscated and affected players are reimbursed.
So the sensational Libratus victory doesn't change much in regards to the difficulties the industry and game is facing -- except it puts the spotlight on the remarkable advances the poker AI has made over the last two years.
No Changes to Live Poker
As for live poker, not much will change in the foreseeable future. We won't start seeing players using their smart phones to calculate perfect strategies. We won't have bystanders whispering the best move from the rail during the WSOP Main Event.
Some professional players will certainly use highly advanced bots to examine and improve their own strategies and become better at the game. But this is happening nowadays already.
It's very likely that live poker will not be substantially affected by bots over the next decades, even. In the same way millions of people still play chess and eagerly watch the chess world championships, despite not being able to beat the AI, we will still see poker players around a green felt playing for titles, glory and millions of dollars for a long time.
Online Poker Will Have to Evolve
For online poker, on the other hand, things do look a bit bleak. It is up to the poker sites to ensure that poker is provided on a level playing field.
The operators have to ensure humans only play against humans. The reputable operators are doing their best already, but of course it's always possible to pass by even the best security measures if you try hard enough.
Online poker right now will not be affected by poker being close to solved by super computers, but to imagine the future of internet poker we again just have to turn to chess. Nobody in their right mind will agree to play a game of chess for a significant amount of money online.
It's possible and probable to be up against some unbeatable AI. Online Chess for fun? Sure thing! For money? Never!
But online poker is currently all about money and at some point in the future it is very likely that even the best security measures by the operators will no longer ensure a bot-free environment. It's only a matter of time before online poker will have to evolve to a new form if it doesn't want to perish. And we're not talking about decades here, but 5-10 years.
When asked whether Libratus meant the end of online poker, Dong Kim said: “Not in the near future, but we should be worried. I'm no rocket scientist but I assume that anything with computers grows exponentially.
"The end is near. It was a good run.”