What is the explore-exploit tradeoff?
The exploration, exploitation trade-off is a dilemma we frequently face in choosing between options. Should you choose what you know and get something close to what you expect (‘exploit’) or choose something you aren’t sure about and possibly learn more (‘explore’)? This happens all the time in everyday life — favourite restaurant, or the new one?; current job, or hunt around?; normal route home, or try another?; and many more. You sacrifice one to have the other — it’s a trade-off. Which of these you should choose depends on how costly the information about the consequences is to gain, how long you’ll be able to take advantage of it, and how large the benefit to you is.
As well as happening in everyday life, this situation arises often in computer science, where the term originated.
Examples of the explore-exploit tradeoff
Your small movie production business has had a few hits over the years and you’re trying to work out what your next project should be. You know that if you did a sequel of an old classic it would have mediocre returns. Alternatively, you could try a hot new idea which is highly unpredictable: it could nosedive, meaning you don’t recover what it cost you to make it, or it could be the next Harry Potter series.
You might have seen this trade-off play out in Hollywood. As profits in the industry wind down, this shifts the balance more towards ‘exploitation’ of known high-performers. To translate, this means we’ll see The Fast and the Furious 15 (the fast cars series full of guns, muscles, cars, muscle cars, bikinis and cool characters not looking at explosions) instead of more new movies in the future.
Also check out
- Value of Information: Four Examples, LessWrong
- Fundamentals of Learning: The Exploration, Exploitation Trade-off, Tom Stafford
- Algorithms to Live By: The Computer Science of Human Decisions, Brian Christian and Tom Griffiths
- Multi-armed bandit, Wikipedia