You are here:

How Netflix chooses which movie you watch next.

Different Home pages?

We have all been there, scrolling through the Netflix home page while not being able to decide what movie or series to watch. Some movies seem interesting, but not enough to really dedicate your night to them. It is certainly a common issue that Netflix is fully aware of. Since Netflix relies heavily on their subscribers, providing them with something to watch is of utmost importance. That is where personalized recommendations come into the picture. You might know about this already since the “For you” tab on the home page has been a visible feature for a while now. However, did you know that Netflix also decides what specific artwork you see per movie? This means that, for example, your friend who likes action movies sees an action-packed artwork for the movie “Inception”, your friend who likes romantic movies sees an artwork of the protagonist together with his wife, and your friend who likes Leonardo Di Caprio movies might see an artwork containing only him.

Of course, this is not as simple as it seems since each movie has multiple actors with multiple possible pictures and scenes. Hence, the challenge lies in selecting custom artwork for each user that optimizes the click-through rate with limited knowledge about each individual. Furthermore, a couple of the main obstacles which Netflix faces is the fact that it can only show one artwork at a time, and to get a data-driven result. It must also quantify the impact that changing the artwork to a different one has on the users’ experience (whether they will be more likely to watch the movie or not). 

Luckily, there exists a model in machine learning that helps tackle these kinds of problems.

Multi-Armed Bandit Model

The name “multi-armed bandit” refers to the classic problem of a gambler facing multiple slot machines (one-armed bandits), each with an unknown probability of winning a prize. The gambler must decide which machine to play in each round to maximize their winnings over time while considering the uncertainty of each machine’s payout rate. In the same way, the multi-armed bandit model balances the exploration of new options with the exploitation of current knowledge to make decisions that maximize a reward over time. The exploitation component favours actions that are believed to lead to the highest reward based on past observations (choose the machine that has led to the highest prize), while the exploration component allows for trying out new actions to gather more information and improve future decisions (choose machines you have not played much to gather data regarding its payout rate).

Coming back to the example of the selection of artwork for Netflix, they are dealing with a trade-off between choosing a picture for someone which already leads to a high clickthrough rate and giving up this “safe choice” by choosing a picture which will give them more information about how that new picture performs for that type of user. The picture below provides a graphical overview of how the tradeoff is represented when looking at the distributions of each possible reward/payout rate. In this example, the blue choice (previously visited choice) has a high probability of yielding a reward of around six, whilst the red choice (the not previously visited choice) gives us a lower probability with a wider interval of around one. Since we have not yet acted upon this choice, we are more uncertain of where the actual reward must lie, hence the wider assumed distribution.

We now have to tackle another issue. How do we decide on the precise balancing of this trade-off?

Trade-off algorithms

Luckily, there exist some predetermined algorithms which indicate when we must explore and when we must exploit. These algorithms are based on optimizing cumulative reward or, conversely, minimizing cumulative regret, where regret is seen as the cost of someone choosing a suboptimal action (choosing artwork for the user that is not the optimal artwork).

The picture below depicts this process where one chooses an algorithm which converges towards the optimal action that has to be taken. A mix of suboptimal and optimal actions will be taken until the algorithm reaches its optimum.

The Greedy, epsilon-greedy, and UCB (Upper Confidence Bound) are three such trade-off algorithms.

Greedy Method

This method is a heuristic where the algorithm always selects the arm with the highest estimated reward and never explores other options. This approach is suboptimal because it does not account for the uncertainty in the estimated reward values and may get stuck in a suboptimal solution if the highest estimated reward is not the actual highest reward.

Epsilon-Greedy

In this method, the algorithm selects the arm with the highest estimated reward with probability (1-epsilon) and selects a random arm with probability epsilon. The parameter epsilon determines the degree of exploration vs exploitation, with higher values of epsilon leading to more exploration. The caveat of this method is finding a good epsilon for the problem you are dealing with.

Upper Confidence Bound

The upper confidence bound or UCB algorithm uses a more sophisticated approach to balance exploration and exploitation by using confidence bounds on the estimated reward values, i.e. it will add an extra buffer to the reward of all actions. This buffer is normally inversely proportional to the number of times the particular action has been chosen. For example, when we have deployed a particular artwork 100 times, we can shrink the confidence interval more since we have more information regarding this payout rate compared to an artwork that has been deployed once. At each time step, the algorithm selects the arm with the highest upper confidence bound, which takes into account both the estimated reward and the degree of uncertainty in the estimate. This approach balances exploration and exploitation by ensuring that arms with high estimated reward and high uncertainty are selected for exploration while arms with high estimated reward and low uncertainty are exploited.

There exist multiple other ways to balance the trade-off between exploration and exploitation; however, most of them build up from the previously mentioned algorithms.

Why is it relevant?

The concept of the multi-armed bandit model and its associated trade-off algorithms have wide-ranging applications beyond just choosing artwork for movie recommendations on Netflix. In fact, this framework can be applied to any situation where there is a trade-off between exploring new options and exploiting known options to maximize rewards over time. For example, it can be used in marketing to choose which advertisement to display to a customer or in medical research to decide which treatment option to provide to a patient. The multi-armed bandit model provides a powerful tool for making such decisions under uncertainty, and the associated trade-off algorithms can help strike a balance between exploring new options and exploiting known options to achieve the best outcomes. Therefore, this model and its algorithms can be relevant and useful in a wide range of contexts where decision-making under uncertainty is necessary.

In conclusion, Netflix faces the challenge of selecting personalized artwork for each movie to optimize click-through rates for individual users while having limited knowledge about each individual. To solve this problem, Netflix uses the Multi-Armed Bandit model, which balances the exploration of new options with the exploitation of current knowledge to make decisions that maximize a reward over time. The model uses trade-off algorithms, such as the Greedy, Epsilon-Greedy, and Upper Confidence Bound, to determine when to explore and when to exploit. These algorithms help Netflix find the optimal artwork for each user, thus increasing the likelihood of user engagement and satisfaction. By using data-driven approaches and sophisticated algorithms, Netflix continues to enhance the user experience, making it the go-to platform for movie and series enthusiasts worldwide.

If you are interested in this subject and would like to get more into mathematics, a good follow-up read would be Thompson Sampling in combination with Bayesian statistics.

Sources: