Ladislav Malecek and Ladislav Peska (Charles University, Prague)

Introduction

Group recommendations are an extension of "single-user" personalized recommender systems (RS), where the final recommendations should comply with the preferences of several group members. An important challenge in group RS is fairness, i.e., no user's preferences should be largely ignored by the RS. Traditional strategies, such as "least misery" or "average rating", tackle the problem of fairness, but they resolve it separately for each item. This may cause a systematic bias against some group members. In contrast, this paper considers both fairness and relevance as a rank-sensitive list property. We propose an EP-FuzzDA algorithm that utilizes an optimization criterion encapsulating both fairness and relevance. In conducted experiments, EP-FuzzDA outperforms several state-of-the-art baselines. Another advantage of EP-FuzzDA is the capability to adjust on non-uniform importance of group members enabling e.g. to maintain the long-term fairness across several recommending sessions.

EP-FuzzDA

We proposed the EP-FuzzDA algorithm for rank-sensitive fairness-preserving group recommendations. EP-FuzzDA iteratively maximizes the sum of exactly-proportional share of per-user relevance scores and therefore jointly optimizes both for relevance and proportionality among users. EP-FuzzDA achieved favorable performance as compared to seven baselines. We also showed that EP-FuzzDA can be applied for groups with non-uniform weights and to maintain long-term fairness across multiple recommending sessions.

We consider group recommending strategy as fair if all users receive items with approx. the same sum of estimated relevance scores. This statement should hold for all prefixes of the recommendations list.

EP-FuzzDA - Exactly-Proportional Fuzzy D’Hondt’s Aggregation algorithm pseudocode

TOTc: total relevance of so far recommended objects (plus the relevance of the considered one)

eu: not yet accounted relevance share of the current user (how much did we ignored this user in the past?)

  • vu: weight of individual user. Can e.g. adjust the lack of fairness in previous recommendation sessions

gainc: sum of per-user relevances of considered item (but only the fair portion of per-user relevances are considered)

  • for example, if some user is over-represented and his/her eu = 0, relevance w.r.t. this user is completely ignored when calculating the best next object.

Results

For each group type, group sizes from 𝑠=2 to 𝑠=8 were considered, while up to 1000 synthetic groups were generated for each combination of group size and type. During the evaluation, the estimated user-item relevance score matrix is calculated by ALS MF for each fold and forwarded to the group recommendation strategies. Finally, each strategy produces top-20 items recommended for the group. Group strategies are evaluated "conditionally" w.r.t. estimated scores given by ALS. I.e., we assume that relevance estimates of the underlying RS are correct.

Results of conditional evaluation of ML1M dataset:

  • Top: uniform weighting scenario for similar and divergent groups of size s=8.

  • Middle: weighted scenario for similar groups and varying group sizes.

  • Bottom: long-term fairness results for similar and divergent groups with group size s=8.

M/M stands for the ratio between minimal and maximal scores per group. Corr and MAE stands for Pearson correlation coefficient and mean absolute error between expected and calculated ratios of utility metrics per group member. Best results are in bold, second-best in italic. Results significantly inferior to the best ones (p<0.0001 w.r.t. paired t-test) are marked with an asterisk (*).

EP FuzzDA results

Results of conditional evaluation of ML1M dataset

Future work

From the algorithmic point of view, we would like to focus on alternatives for the EP-FuzzDA criterion that optimize rank-sensitive metrics like nDCG more directly and/or vary the fairness vs. relevance tradeoff. Additional evaluation scenarios including more datasets and more base RS should be conducted to corroborate the initial results. Finally, we would like to conduct a user study focused on the long-term fairness preservation in realistic group recommendation settings.