We’ll now learn how Carnegie Mellon University and a team of researchers there are producing amazing results with strategic reasoning thanks in part to powerful new memory-intense systems architectures.
Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy.
To learn more about strategic reasoning advances, please join me in welcoming Tuomas Sandholm,
Professor and Director of the Electronic
Marketplaces Lab at Carnegie Mellon University in Pittsburgh. The discussion is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions.
Here are some excerpts:
That's what
game theory is about. In artificial intelligence (AI), there has been a long
history of strategic reasoning. Most AI reasoning -- not all of it, but most of
it until about 12 years ago -- was really about perfect information games like
Othello, Checkers, Chess and Go.
Before we dig in to how is this being applied in business circumstances, explain your proof of concept involving poker. Is it Five-Card Draw?
Sandholm: It's all of the above, and we are very fortunate that we had access to Bridges; otherwise this wouldn’t have been possible at all. We spent more than a year and needed about 25 million core hours of computing and 2.6 petabytes of data storage.
Gardner: And these are some of the most difficult problems that businesses face. They have huge billion-dollar investments that they need to line up behind for these types of decisions. Because of that pipeline, by the time they get to a dynamic environment where they can assess -- it's often too late. So having the best strategic reasoning as far in advance as possible is a huge benefit.
Sandholm: Actually here in the live game in Las Vegas they don't allow that type of computational support. On the Internet, AI has become a big problem on gaming sites, and it will become an increasing problem. We don't put our AI in there; it’s against their site rules. Also, I think it's unethical to pretend to be a human when you are not. The business opportunities, the monetary opportunities in the business applications, are much bigger than what you could hope to make in poker anyway.
Gardner: Tell us about strategic reasoning
and why imperfect information is often the reality that these systems face?
Sandholm: In strategic reasoning we take the word “strategic” very seriously. It means game theoretic,
so in multi-agent settings where you have more than one player, you can't just
optimize as if you were the only actor -- because the other players are going
to act strategically. What you do affects how they should play, and what they
do affects how you should play.
Sandholm |
And there has
been tremendous progress. But these complete information, or perfect information,
games don't really model real business situations very well. Most business
situations are of imperfect
information.
Know what you don’t know
So you don't know the other guy's resources, their
goals and so on. You then need totally different algorithms for solving these
games, or game-theoretic solutions that define what rational play is, or
opponent exploitation techniques where you try to find out the opponent's
mistakes and learn to exploit them.
So totally
different techniques are needed, and this has way more applications in reality
than perfect information games have.
Gardner: In business, you don't always know
the rules. All the variables are dynamic, and we don't know the rationale or
the reasoning behind competitors’ actions. People sometimes are playing
offense, defense, or a little of both.
Before we dig in to how is this being applied in business circumstances, explain your proof of concept involving poker. Is it Five-Card Draw?
Heads-Up No-Limit Texas Hold'em has become the leading benchmark in the AI community.
Sandholm: No, we’re working on a much harder poker
game called Heads-Up No-Limit Texas Hold'em as the benchmark. This has become
the leading benchmark in the AI community for testing these application-independent
algorithms for reasoning under imperfect information.
The algorithms
have really nothing to do with poker, but we needed a common benchmark, much
like the IC chip makers have their benchmarks. We compare progress year-to-year and
compare progress across the different research groups around the world. Heads-Up No-limit Texas
Hold'em turned out to be great benchmark because it is a huge game of
imperfect information.
It has 10
to the 161 different situations that a player can face. That is one followed by
161 zeros. And if you think about that, it’s not only more than the number of
atoms in the universe, but even if, for every atom in the universe, you have a whole
other universe and count all those atoms in those universes -- it will still be
more than that.
Gardner: This is as close to infinity as you
can probably get, right?
Sandholm: Ha-ha, basically yes.
Gardner: Okay, so you have this massively
complex potential data set. How do you winnow that down, and how rapidly does
the algorithmic process and platform learn? I imagine that being reactive,
creating a pattern that creates better learning is an important part of it. So
tell me about the learning part.
Three part harmony
Sandholm: The learning part always interests
people, but it's not really the only part here -- or not even the main part. We
basically have three main modules in our architecture. One computes
approximations of Nash
equilibrium strategies using only the rules of the game as input. In other
words, game-theoretic strategies.
That doesn’t
take any data as input, just the rules of the game. The second part is during
play, refining that strategy. We call that subgame solving.
Then the
third part is the learning part, or the self-improvement part. And there,
traditionally people have done what’s called opponent modeling and opponent
exploitation, where you try to model the opponent or opponents and adjust your
strategies so as to take advantage of their weaknesses.
However, when we go against these absolute best human strategies, the best human players in the world, I felt that they don't have that many holes to exploit and they are experts at counter-exploiting. When you start to exploit opponents, you typically open yourself up for exploitation, and we didn't want to take that risk. In the learning part, the third part, we took a totally different approach than traditionally is taken in AI.
However, when we go against these absolute best human strategies, the best human players in the world, I felt that they don't have that many holes to exploit and they are experts at counter-exploiting. When you start to exploit opponents, you typically open yourself up for exploitation, and we didn't want to take that risk. In the learning part, the third part, we took a totally different approach than traditionally is taken in AI.
We
are letting the opponents tell us where the holes are in our strategy.
Then, in the background, using supercomputing, we are fixing those
holes.
We said, “Okay, we are going to play according to our approximate game-theoretic strategies. However, if we see that the opponents have been able to find some
mistakes in our strategy, then we will actually fill those mistakes and compute
an even closer approximation to game-theoretic play in those spots.”
One way to
think about that is that we are letting the opponents tell us where the holes are
in our strategy. Then, in the background, using supercomputing, we are fixing
those holes.
All three
of these modules run on the Bridges
supercomputer at the Pittsburgh Supercomputing Center (PSC), for which the hardware
was built by Hewlett Packard Enterprise (HPE).
HPC from HPE
To Supercomputing and Deep Learning
Gardner: Is this being used in any business
settings? It certainly seems like there's potential there for a lot of use
cases. Business competition and circumstances seem to have an affinity for what
you're describing in the poker use case. Where are you taking this next?
Sandholm: So far this, to my knowledge, has not
been used in business. One of the reasons is that we have just reached the
superhuman level in January 2017. And, of course, if you think about your
strategic reasoning problems, many of them are very important, and you don't
want to delegate them to AI just to save time or something like that.
Now that
the AI is better at strategic reasoning than humans, that completely shifts things.
I believe that in the next few years it will be a necessity to have what I call
strategic augmentation. So you can't have just people doing business strategy,
negotiation, strategic pricing, and product portfolio optimization.
You are
going to have to have better strategic reasoning to support you, and so it
becomes a kind of competition. So if your competitors have it, or even if they
don't, you better have it because it’s a competitive advantage.
Gardner: So a lot of what we're seeing in AI
and machine learning is to find the things that the machines do better and
allow the humans to do what they can do even better than machines. Now that you
have this new capability with strategic reasoning, where does that demarcation
come in a business setting? Where do you think that humans will be still
paramount, and where will the machines be a very powerful tool for them?
Human modeling, AI solving
Sandholm: At least in the foreseeable future,
I see the demarcation as being modeling versus solving. I think that humans
will continue to play a very important role in modeling their strategic
situations, just to know everything that is pertinent and deciding what’s not
pertinent in the model, and so forth. Then the AI is best at solving the model.
That's the
demarcation, at least for the foreseeable future. In the very long run, maybe
the AI itself actually can start to do the modeling part as well as it builds a
better understanding of the world -- but that is far in the future.
Gardner: Looking back as to what is enabling
this, clearly the software and the algorithms and finding the right benchmark,
in this case the poker game are essential. But with that large of a data set
potential -- probabilities set like you mentioned -- the underlying computersystems must need to keep up. Where are you in terms of the threshold that
holds you back? Is this a price issue that holds you back? Is it a performance
limit, the amount of time required? What are the limits, the governors to continuing?
Sandholm: It's all of the above, and we are very fortunate that we had access to Bridges; otherwise this wouldn’t have been possible at all. We spent more than a year and needed about 25 million core hours of computing and 2.6 petabytes of data storage.
This amount
is necessary to conduct serious absolute superhuman research in this field --
but it is something very hard for a professor to obtain. We were very fortunate
to have that computing at our disposal.
Gardner: Let's examine the commercialization
potential of this. You're not only a professor at Carnegie Mellon, you’re a founder
and CEO of a few companies. Tell us about your companies and how the research
is leading to business benefits.
Superhuman business strategies
Sandholm: Let’s start with Strategic Machine,
a brand-new start-up company, all of two months old. It’s already profitable,
and we are applying the strategic reasoning technology, which again is
application independent, along with the Libratus technology, the Lengpudashi
technology, and a host of other technologies that we have exclusively licensed
to Strategic Machine. We are doing research and development at Strategic Machine as well, and we
are taking these to any application that wants us.
Such applications include business strategy
optimization, automated negotiation, and strategic pricing. Typically when
people do pricing optimization algorithmically, they assume that either their
company is a monopolist or the competitors’ prices are fixed, but obviously
neither is typically true.
We are
looking at how do you price strategically where you are taking into account the
opponent’s strategic response in advance. So you price into the future, instead
of just pricing reactively. The same can be done for product portfolio
optimization along with pricing.
Let's say
you're a car manufacturer and you decide what product portfolio you will offer
and at what prices. Well, what you should do depends on what your competitors
do and vice versa, but you don’t know that in advance. So again, it’s an imperfect-information
game.
Gardner: And these are some of the most difficult problems that businesses face. They have huge billion-dollar investments that they need to line up behind for these types of decisions. Because of that pipeline, by the time they get to a dynamic environment where they can assess -- it's often too late. So having the best strategic reasoning as far in advance as possible is a huge benefit.
If
you think about machine learning traditionally, it's about learning
from the past. But strategic reasoning is all about figuring out what's
going to happen in the future.
Sandholm: Exactly! If you think about machine
learning traditionally, it's about learning from the past. But strategic
reasoning is all about figuring out what's going to happen in the future. And
you can marry these up, of course, where the machine learning gives the
strategic reasoning technology prior beliefs, and other information to put into
the model.
There are
also other applications. For example, cyber security has several applications,
such as zero-day vulnerabilities. You can run your custom algorithms and
standard algorithms to find them, and what algorithms you should run depends on
what the other opposing governments run -- so it is a game.
Similarly,
once you find them, how do you play them? Do you report your vulnerabilities to
Microsoft? Do you attack with them, or do you stockpile them? Again, your best strategy
depends on what all the opponents do, and that's also a very strategic
application.
And in
upstairs blocks trading, in finance, it’s the same thing: A few players, very
big, very strategic.
Gaming your own immune system
The most radical application is something that we are
working on currently in the lab where we are doing medical treatment planning
using these types of sequential planning techniques. We're actually testing how
well one can steer a patient's T-cell population to fight cancers, autoimmune
diseases, and infections better by not just using one short treatment plan -- but
through sophisticated conditional treatment plans where the adversary is
actually your own immune system.
Gardner: Or cancer is your opponent, and you
need to beat it?
Sandholm: Yes, that’s right. There are
actually two different ways to think about that, and they lead to different
algorithms. We have looked at it where the actual disease is the opponent --
but here we are actually looking at how do you steer your own T-cell population.
Gardner: Going back to the technology, we've
heard quite a bit from HPE about more memory-driven and edge-driven computing, where
the analysis can happen closer to where the data is gathered. Are these advances
of any use to you in better strategic reasoning algorithmic processing?
Algorithms at the edge
Sandholm: Yes, absolutely! We actually
started running at the PSC on an earlier supercomputer, maybe 10 years ago,
which was a shared-memory architecture. And then with Bridges, which is mostly a
distributed system, we used distributed algorithms. As we go into the future
with shared memory, we could get a lot of speedups.
We have
both types of algorithms, so we know that we can run on both architectures. But
obviously, the shared-memory, if it can fit our models and the dynamic state of
the algorithms, is much faster.
Gardner: So the HPE Machine must be of interest
to you: HPE’s advanced concept demonstration model, with a memory-driven
architecture, photonics for internal communications, and so forth. Is that a
technology you're keeping a keen eye on?
Sandholm: Yes. That would definitely be a
desirable thing for us, but what we really focus on is the algorithms and the
AI research. We have been very fortunate in that the PSC and HPE have been able
to take care of the hardware side.
We really don’t
get involved in the hardware side that much, and I'm looking at it from the outside.
I'm trusting that they will continue to build the best hardware and maintain it
in the best way -- so that we can focus on the AI research.
Gardner: Of course, you could help supplement
the cost of the hardware by playing superhuman poker in places like Las Vegas,
and perhaps doing quite well.
Sandholm: Actually here in the live game in Las Vegas they don't allow that type of computational support. On the Internet, AI has become a big problem on gaming sites, and it will become an increasing problem. We don't put our AI in there; it’s against their site rules. Also, I think it's unethical to pretend to be a human when you are not. The business opportunities, the monetary opportunities in the business applications, are much bigger than what you could hope to make in poker anyway.
Listen to the podcast. Find it on iTunes. Get the mobile app. Read a full transcript or download a copy. Sponsor: Hewlett Packard Enterprise.
You may also be
interested in:
- Philips teams with HPE on ecosystem approach to improve healthcare informatics-driven outcome
- Inside story: How Ormuco abstracts the concepts of private and public cloud across the globe
- How Nokia refactors the video delivery business with new time-managed IT financing models
- IoT capabilities open new doors for Miami telecoms platform provider Identidad IoT
- Inside story on developing the ultimate SDN-enabled hybrid cloud object storage environment
- How IoT and OT collaborate to usher in the data-driven factory of the future
- DreamWorksAnimation crafts its next era of dynamic IT infrastructure
No comments:
Post a Comment