Nicolò Felicioni, Maurizio Ferrari Dacrema, Paolo Cremonesi

Politecnico di Milano

Milan, Italy

 

Scenario

A user interface with a carousel layout is an interface where recommendations are organized in multiple rows (called carousels).

This type of interface is usually adopted in music and video streaming service. Here is an example of the Netflix homepage:

   screen carousel layout

Following standard offline evaluation procedures, building a carousel layout consists of:

  1. Evaluating the available carousels independently with standard ranking metrics;
  2. Ranking the carousels according to their independent quality;
  3. Show the K carousels with the highest independent quality, where K is the number of carousels to show in the UI.

Problems with standard evaluation

The problem is that these evaluation procedures do not consider important characteristics of a carousel setting.

For instance, standard one-dimensional ranking metrics assume the user exploration to be sequential, one carousel at a time:

onedim

But users will not navigate one carousel at a time. Instead, they will tend to focus on the top-left corner, with decreasing attention towards the right and the bottom:

onedim 

Also, different carousels are generated with different algorithms or come from different providers. Hence, in general, there may be duplicates across carousels, a scenario not considered in standard ranking evaluation:

duplicates

Research Contributions

The first contribution is the defnition of a two-dimensional ranking metric, extending the well-known DCG metric. Eq. (1) is one of the most used definition of the DCG metric, where c is the length of the ranked list (the cutoff), rel(i) is the relevance value of the recommendation at the i-th position.

Eq. (2) shows our two-dimensional extension, where l is the number of carousels present in the interface, while c is the cutoff of a single carousel. The terms α and β are two weights to be set according to the particular use case. 

Screenshot 2021 06 17 at 10.19.11

The second contribution is the proposal a novel offline evaluation framework, with which we are able to evaluate each carousel in a context where other carousels are already available, considering the relevant characteristics of a carousel layout we highlighted before.

Results

We only selected datasets from domains that tend to use the carousel user Screenshot 2021 06 17 at 10.27.12
interface, such as MovieLens10M and NetflixPrize.

For each carousel, we evaluated its individual quality and its quality when there is already another carousel fixed above the one evaluated (see Figure on the right).

We evaluated the quality of a carousel with Mean Average Precision (MAP).

 

In the following, we show the results obtained on MovieLens10M. The results on Netflix are in the additional material.

The fixed carousel is SLIM ElasticNet, which is the one with the highest individual quality.

table

Discussion

Some interesting aspects to notice from the results are:

  • Based on the individual evaluation, UserKNN is the best performing algorithm, while IALS is the best when the fixed carousel is considered
  • All matrix factorization models gain positions when evaluated with the carousel evaluation, in particular MF BPR gains 6 positions
  • On the other hand, item-based machine learning models tend to lose some positions
  • EASER loses 8 positions in the carousel evaluation, probably because of its resemblance with the SLIM algorithm

From these results, we can see how the evaluation of a carousel should consider also how well its recommendations complement the other available carousels.

The relative ranking of the personalized algorithms changes when a carousel is fixed as the first displayed to the user. This indicates that, following our carousel evaluation, we would build a different carousel layout than the one built with the individual evaluation.