Proceedings

UMAP 2025 online proceedings can now be viewed from the following links:

Main Proceedings: https://dl.acm.org/doi/proceedings/10.1145/3627043

Adjunct Proceedings: https://dl.acm.org/doi/proceedings/10.1145/3631700

TOC Main Proceedings

The ACM OpenTOC service enables visitors to download the articles below at no charge.

Open Main Proceedings

UMAP '25: Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization

Full Citation in the ACM Digital Library

SESSION: Full Papers

Assessing Medical Training Skills via Eye and Head Movements

Kayhan Latifzadeh
Luis A. Leiva
Klen Čopič Pucihar
Matjaž Kljun
Iztok Devetak
Lili Steblovnik

We examined eye and head movements to gain insights into skill development in clinical settings. A total of 24 practitioners participated in simulated baby delivery training sessions. We calculated key metrics, including pupillary response rate, fixation duration, or angular velocity. Our findings indicate that eye and head tracking can effectively differentiate between trained and untrained practitioners, particularly during labor tasks. For example, head-related features achieved an F1 score of 0.85 and AUC of 0.86, whereas pupil-related features achieved F1 score of 0.77 and AUC of 0.85. The results lay the groundwork for computational models that support implicit skill assessment and training in clinical settings by using commodity eye-tracking glasses as a complementary device to more traditional evaluation methods such as subjective scores.

Augmenting Personalized Memory via Practical Multimodal Wearable Sensing in Visual Search and Wayfinding Navigation

Indrajeet Ghosh
Kasthuri Jayarajah
Nicholas Waytowich
Nirmalya Roy

Working memory involves the temporary retention of information over short periods. It is a critical cognitive function that enables humans to perform various online processing tasks, such as dialing a phone number, recalling misplaced items’ locations, or navigating through a store. However, inherent limitations in an individual’s capacity to retain information often result in forgetting important details during such tasks. Although previous research has successfully utilized wearable and assistive technologies to enhance long-term memory functions (e.g., episodic memory), their application to supporting short-term recall in daily activities remains underexplored. To address this gap, we present Memento, a framework that uses multimodal wearable sensor data to detect significant changes in cognitive state and provide intelligent in situ cues to enhance recall. Through two user studies involving 15 and 25 participants in visual search navigation tasks, we demonstrate that participants receiving visual cues from Memento achieved significantly better route recall, improving approximately 20-23% compared to free recall. Furthermore, Memento reduced cognitive load and review time by 46% while also achieving a substantial reduction in computation time (3.86 secs vs. 15.35 secs), offering an average 75% effective compared to computer vision-based cues selection approaches.

Comparing Cognitive and Affective Theory of Mind for an Assistive Robotics Application

Luca Raggioli
Antimo Cantiello
Raffaella Esposito
Alessandra Rossi
Silvia Rossi

Human-robot interaction in cooperative and assistive scenarios requires robotic systems to assess the task state and coherently choose their next move. Moreover, it is also fundamental to correctly recognize how the user’s stress and emotional response are changing to offer support appropriately. The robot should be able to adapt to different user reactions, considering the situational context, and displaying empathetic behaviors aiming to support and encourage the users. In this work, we aim to assess the impact of empathetic supporting behaviors on the perception of the robot and the users’ performance during a collaborative task, as opposed to assistive strategies focusing only on the task’s performance. With this objective in mind, we propose a robotic architecture to assist a user in playing a memory game in real-time using a Furhat robot. We conducted a user study where 60 participants played with the robot to evaluate the effects of the two types of Theory of Mind on the assistive task and their perception of the robot. To this extent, the participants interacted with a robot endowed with either Cognitive or Affective Theory of Mind to respectively allow the robot to understand intentions and beliefs, and to show empathetic behaviors to improve the collaboration. The two conditions resulted in achieving the same results in terms of task performance, but the participants rated the emotionally engaged robot higher in perceived social intelligence.

Disentangling Stakeholder Role and Expertise in User-Centered Explainable AI

Maxwell Szymanski
Vero Vanden Abeele
Katrien Verbert

Identifying explanation needs based on user characteristics has been the focus of human-centred research within XAI for some time. In Ribera et al.’s proposal of user-centred XAI, expertise was used as a proxy for characterising the user, and in turn guide explanation design. Since then, the research landscape has evolved to include a broader notion of stakeholders, ranging from AI developers to external regulators to the affected users of AI decisions. However, with this broadening of stakeholder roles, there emerged a pattern of conflating expertise and role, such as the term “end user” being used interchangeably for domain experts using (X)AI for decision-making and lay users impacted by AI decisions, with both having vastly different explanatory needs. In this work, we revisit previous surveys with the aim to identify and classify stakeholders in the XAI ecosystem. We propose to consistently categorise stakeholders along separate expertise and role dimensions. By disentangling both, we present a framework that highlights the diversity of stakeholder goals and the challenges of aligning explanation design with varied user requirements. Our analysis maps stakeholders onto these dimensions and discusses how using both expertise and role can inform the development of more tailored and effective XAI solutions.

Empowering Recommender Systems based on Large Language Models through Knowledge Injection Techniques

Alessandro Petruzzelli
Cataldo Musto
Marco de Gemmis
Giovanni Semeraro
Pasquale Lops

Recommender systems (RSs) have become increasingly versatile, finding applications across diverse domains. Large Language Models (LLMs) significantly contribute to this advancement since the vast amount of knowledge embedded in these models can be easily exploited to provide users with high-quality recommendations. However, current RSs based on LLMs have room for improvement. As an example, knowledge injection techniques can be used to fine-tune LLMs by incorporating additional data, thus improving their performance on downstream tasks. In a recommendation setting, these techniques can be exploited to incorporate further knowledge, which can result in a more accurate representation of the items. Accordingly, in this paper, we propose a pipeline for knowledge injection specifically designed for RS. First, we incorporate external knowledge by drawing on three sources: (a) knowledge graphs; (b) textual descriptions; (c) collaborative information about user interactions. Next, we lexicalize the knowledge, and we instruct and fine-tune an LLM, which can easily return a list of recommendations. Extensive experiments on movie, music, and book datasets validate our approach. Moreover, the experiments showed that knowledge injection is particularly needed in domains (i.e., music and books) where the encoded knowledge within LLMs may not be suitable for recommendation tasks, even if such content was used during the training of the model. This finding points to several promising future research directions.

“Even explanations will not help in trusting [this] fundamentally biased system”: A Predictive Policing Case-Study

Siddharth Mehrotra
Ujwal Gadiraju
Eva Bittner
Folkert van Delden
Catholijn M. Jonker
Myrthe L. Tielman

In today’s society, where Artificial Intelligence (AI) has gained a vital role, concerns regarding user’s trust have garnered significant attention. The use of AI systems in high-risk domains have often led users to either under-trust it, potentially causing inadequate reliance or over-trust it, resulting in over-compliance. Therefore, users must maintain an appropriate level of trust. Past research has indicated that explanations provided by AI systems can enhance user understanding of when to trust or not trust the system. However, the utility of presentation of different explanations forms still remains to be explored especially in high-risk domains. Therefore, this study explores the impact of different explanation types (text, visual, and hybrid) and user expertise (retired police officers and lay users) on establishing appropriate trust in AI-based predictive policing. While we observed that the hybrid form of explanations increased the subjective trust in AI for expert users, it did not led to better decision-making. Furthermore, no form of explanations helped build appropriate trust. The findings of our study emphasize the importance of re-evaluating the use of explanations to build [appropriate] trust in AI based systems especially when the system’s use is questionable. Finally, we synthesize potential challenges and policy recommendations based on our results to design for appropriate trust in high-risk based AI-based systems.

Familiarizing with Music: Discovery Patterns for Different Music Discovery Needs

Marta Moscati
Darius Afchar
Markus Schedl
Bruno Sguerra

Humans have the tendency to discover and explore. This natural tendency is reflected in data from streaming platforms as the amount of previously unknown content accessed by users. Additionally, in domains such as that of music streaming there is evidence that recommending novel content improves users’ experience with the platform. Therefore, understanding users’ discovery patterns, such as the amount to which and the way users access previously unknown content, is a topic of relevance for both the scientific community and the streaming industry, particularly the music one. Previous works studied how music consumption differs for users of different traits and looked at diversity, novelty, and consistency over time of users’ music preferences. However, very little is known about how users discover and explore previously unknown music, and how this behavior differs for users of varying discovery needs. In this paper we bridge this gap by analyzing data from a survey answered by users of the major music streaming platform Deezer in combination with their streaming data. We first address questions regarding whether users who declare a higher interest in unfamiliar music listen to more diverse music, have more stable music preferences over time, and explore more music within a same time window, compared to those who declare a lower interest. We then investigate which type of music tracks users choose to listen to when they explore unfamiliar music, identifying clear patterns of popularity and genre representativeness that vary for users of different discovery needs.

Our findings open up possibilities to infer users’ interest in unfamiliar music from streaming data as well as possibilities to develop recommender systems that guide users in exploring music in a more natural way.

GAL-KARS: Exploiting LLMs for Graph Augmentation in Knowledge-Aware Recommender Systems

Giuseppe Spillo
Cataldo Musto
Matteo Mannavola
Marco de Gemmis
Pasquale Lops
Giovanni Semeraro

In this paper, we propose a recommendation model that exploits a graph augmentation technique based on Large Language Models (LLMs) to enrich the information encoded in its underlying Knowledge Graph (KG). Our work relies on the assumption that the triples encoded in a KG can often be noisy or incomplete, and this may lead to sub-optimal modeling of both the characteristics of items and the users’ preferences. In this setting, graph augmentation can be a suitable solution to improve the quality of the data model and provide users with high-quality recommendations.

Accordingly, in this work, we align with this research line and propose GAL-KARS (Graph Augmentation with LLMs for Knowledge-Aware Recommender Systems). In our framework, we start from a KG, and we design some prompts for querying an LLM and augmenting the graph by incorporating: (a) further features describing the items; (b) further nodes describing the preferences of the users, obtained by reasoning over the items they like. The resulting KG is then passed through a Knowledge Graph Encoder that learns users’ and items’ embeddings based on the augmented KG. These embeddings are finally used to train a recommendation model and provide users with personalized suggestions. As shown in the experimental session, graph augmentation based on LLMs can significantly improve the predictive accuracy of our recommendation model, thus confirming the effectiveness of the model and the validity of our intuitions.

Generative Framework for Personalized Persuasion: Inferring Causal, Counterfactual, and Latent Knowledge

Donghuo Zeng
Roberto Legaspi
Yuewen Sun
Xinshuai Dong
Kazushi Ikeda
Peter Spirtes
Kun Zhang

We hypothesize that optimal system responses emerge from adaptive strategies grounded in causal and counterfactual knowledge. Counterfactual inference allows us to create hypothetical scenarios to examine the effects of alternative system responses. We enhance this process through causal discovery, which identifies the strategies informed by the underlying causal structure that govern system behaviors. Moreover, we consider the psychological constructs and unobservable noises that might be influencing user-system interactions as latent factors. We show that these factors can be effectively estimated. We employ causal discovery to identify strategy-level causal relationships among user and system utterances, guiding the generation of personalized counterfactual dialogues. We model the user utterance strategies as causal factors, enabling system strategies to be treated as counterfactual actions. Furthermore, we optimize policies for selecting system responses based on counterfactual data. Our results using a real-world dataset on social good demonstrate significant improvements in persuasive system outcomes, with increased cumulative rewards validating the efficacy of causal discovery in guiding personalized counterfactual inference and optimizing dialogue policies for a persuasive dialogue system.

Granular Feedback: Leveraging Domain Expertise and Explainable AI to Effectively Steer Models

Maxwell Szymanski
John Stamper
Vero Vanden Abeele
Katrien Verbert

The use of large language models (LLMs) for automated content generation has seen a steady rise in recent years in domains such as education. While research has increasingly explored human-AI collaboration, including the use of feedback and control to enable teachers to improve model performance with domain knowledge, no studies have explored the level of detail teachers are willing to provide in their feedback towards LLM systems. In an automated question generation system, we introduce the concept of granular feedback, which allows teachers to provide feedback on generated questions through critiquing individual features revealed by the model (i.e. question difficulty, Blooms taxonomy), and compare it a more general but widely used 5-point rating that rates the question overall. Through in-depth interviews with 16 teachers, we explore how a detailed but more time-consuming granular feedback compares to a more general but familiar 5-point rating on the question as a whole. Results show a strong preference towards granular feedback over general feedback, driven by factors such as long-term efficiency, personalisation and personal reassurance. Additionally, we highlight several factors that positively influence a user’s willingness to give feedback, such as the optional nature of giving feedback and explicit disclosures on model improvement. As usefulness of granular feedback strongly depended on the features users can give feedback on, we discuss how those were perceived by participants and changing these could further improve feedback. To conclude, we propose several design suggestions related to designing granular feedback, such as aligning feedback options with their mental models and providing means to introduce additional contextual information to limit repetition in provided feedback.

Impact of Adaptive Feedback on Learning Programming with a Serious Game in High Schools’ Classes

Matthieu Branthôme
Sébastien Lallé

This study evaluates the impact of an adaptive feedback system in Pyrates, a programming serious game designed to ease the transition from block-based to text-based programming in high school classes. The adaptive feedback system was implemented to support student learning and lessen teachers’ intervention workload in the classroom. To assess its effectiveness, a field user study was conducted with 190 high school students across two institutions. Results show that students progressed significantly further in the game when using the adaptive feedback system, as compared to playing without feedback, although it did not affect learning gains. We discuss the implications of these results for the design of adaptive feedback in programming serious games.

Legal but Unfair: Auditing the Impact of Data Minimization on Fairness and Accuracy Trade-off in Recommender Systems

Salvatore Bufi
Vincenzo Paparella
Vito Walter Anelli
Tommaso Di Noia

Data minimization, required by recent data privacy regulations, is crucial for user privacy, but its impact on recommender systems remains largely unclear. The core problem lies in the fact that reducing or altering the training data of these systems can drastically affect their performance. While previous research has explored how data minimization affects recommendation accuracy, a critical gap remains: How does data minimization impact consumers’ and providers’ fairness? This study addresses this gap by systematically examining how data minimization influences multiple objectives in recommender systems, i.e., the trade-offs between accuracy, user fairness, and provider fairness. Our investigation includes (i) an analysis of how the data minimization strategies affect RS performance across these objectives, (ii) an assessment of data minimization techniques to determine those that can balance better the trade-off among the considered objectives, and (iii) an evaluation of the robustness of different recommendation models under diverse minimization strategies to identify those that best maintain performance. The findings reveal that data minimization can sometimes undermine provider fairness, albeit enhancing group-based consumer fairness to the detriment of accuracy. Additionally, different strategies can offer diverse trade-offs for the assessed objectives. The source code supporting this study is available at https://github.com/salvatore-bufi/DataMinimizationFairness.

Mindful Escape: a Mobile Serious Game to Predict the Personality Trait Cooperation

José Dias
Patrícia Alves
Joana Neto
Goreti Marreiros

Personality plays a crucial role in predicting preferences, behaviors, and interactions. Its importance in accurately characterizing individuals has led to its application in areas such as movies, music, and tourism. Although personality questionnaires have traditionally been used to measure personality, they are prone to biases, such as inflated or false responses. In response to these limitations, serious games emerge as innovative alternatives for assessing personality, studying the player's behavior. This study developed and evaluated Mindful Escape, a short duration mobile serious game, as a concept proof to implicitly measure the personality trait of cooperation. The game adapts concepts from the Prisoner's Dilemma and the Tragedy of the Commons to create an Escape Room environment to encourage both cooperative and competitive interactions. Experiments with real users were performed (\(n \)=78), where significant correlations between the game's metrics and cooperation were identified. Additionally, other traits such as modesty, morality, altruism, and anger, also showed correlations. The game's duration exceeded the planned 5min, averaging ca. 10min, mainly due to difficulties related to gameplay by less experienced users, which needs to be addressed in the future. Nevertheless, the participants’ feedback was highly positive, highlighting the immersive and engaging experience offered by the game. The results show short-duration mobile games offer a viable and unobtrusive method for assessing users' detailed personality traits, paving the way to replace traditional personality questionnaires, and their integration into personality-based systems.

Personalizing LLM Responses to Combat Political Misinformation

Adiba Proma
Neeley Pate
James Druckman
Gourab Ghoshal
Ehsan Hoque

Despite various efforts to tackle online misinformation, people inevitably encounter and engage with it, especially on social media platforms. Recent advances in LLMs present an opportunity to develop personalized interventions to address misinformed beliefs, and potentially offer more effective approaches than existing non-tailored methods. In this paper, we design and evaluate personalized LLM agent that can consider users’ demographics and personalities to tailor responses to mitigate misinformed beliefs. Our pipeline is grounded in facts through an external Retrieval Augmented Generation (RAG) knowledge base and is able to generate diverse output as a result of the personalization, with an average cosine similarity of 0.538. Our pipeline scores an average rating of 3.99 out of 5 when evaluated by a GPT-4o-mini LLM judge for response persuasiveness. Our methods can be adapted to design similar personalized agents in other domains.

Pilot Trainees Benefit from Modelling and Adaptive Feedback

Yalmaz Ali Abdullah
Michael Guevarra
Minghao Cai
Jialiang Yan
Matthew E. Taylor
Carrie Demmans Epp

Limited training capacity has contributed to a critical shortage of licensed commercial pilots. Adaptive educational technologies and simulators could alleviate current training bottlenecks if these technologies could assess trainee performance and provide appropriate feedback. Agents can be used to assess trainee performance, but there is insufficient guidance on how to provide concurrent feedback in simulation-based learning environments. So, we designed 4 feedback conditions that provide varying degrees of elaboration and used a within-subject study (n = 20) to compare feedback approaches. Trainee performance was best when they received highly-elaborative feedback that modeled expert behaviour. Variability in participant performance and preferences indicates a need to adapt the feedback type to individual learners and provides insight into the use of concurrent feedback in simulation-based learning environments. Specifically, learners appreciated the expert model because it facilitated a sense of control which was associated with lower negative affect and lower extraneous cognitive load.

Sentence Encoder-Based Clustering Method for Modeling Students' Learning Programming Behavior

Mubina Kamberovic
Amina Mevic
Senka Krivic

Introductory programming courses are widely known for their difficulty among students. Success in courses is commonly measured in the form of final grades, which might not capture the challenges students face during their learning process. In this paper, we predict students’ success and their future compiler errors based on previously made errors. Furthermore, we examine the effect of applying two clustering techniques before making the predictions and identify key weeks and errors that have the greatest impact on predictions. Experimental results show that students’ compiler errors observed through the semester are an important predictor of students’ achievement and future struggles. Predictions are further improved using sentence encoder-generated embeddings with K-Means algorithm. Our study suggests that students’ errors, particularly the most recent ones, enable meaningful clustering that enhances performance prediction after only three weeks of the semester.

Should We Tailor the Talk? Understanding the Impact of Conversational Styles on Preference Elicitation in Conversational Recommender Systems

Ivica Kostric
Krisztian Balog
Ujwal Gadiraju

Conversational recommender systems (CRSs) provide users with an interactive means to express preferences and receive real-time personalized recommendations. The success of these systems is heavily influenced by the preference elicitation process. While existing research mainly focuses on what questions to ask during preference elicitation, there is a notable gap in understanding what role broader interaction patterns—including tone, pacing, and level of proactiveness—play in supporting users in completing a given task. This study investigates the impact of different conversational styles on preference elicitation, task performance, and user satisfaction with CRSs. We conducted a controlled experiment in the context of scientific literature recommendation, contrasting two distinct conversational styles—high involvement (fast-paced, direct, and proactive with frequent prompts) and high considerateness (polite and accommodating, prioritizing clarity and user comfort)—alongside a flexible experimental condition where users could switch between the two. Our results indicate that adapting conversational strategies based on user expertise and allowing flexibility between styles can enhance both user satisfaction and the effectiveness of recommendations in CRSs. Overall, our findings hold important implications for the design of future CRSs.

"Show Me How": Benefits and Challenges of Agent-Augmented Counterfactual Explanations for Non-Expert Users

Aditya Bhattacharya
Tim Vanherwegen
Katrien Verbert

Counterfactual explanations offer actionable insights by illustrating how changes to inputs can lead to different outcomes. However, these explanations often suffer from ambiguity and impracticality, limiting their utility for non-expert users with limited AI knowledge. Augmenting counterfactual explanations with Large Language Models (LLMs) has been proposed as a solution, but little research has examined their benefits and challenges for non-experts. To address this gap, we developed a healthcare-focused system that leverages conversational AI agents to enhance counterfactual explanations, offering clear, actionable recommendations to help patients at high risk of cardiovascular disease (CVD) reduce their risk. Evaluated through a mixed-methods study with 34 participants, our findings highlight the effectiveness of agent-augmented counterfactuals in improving actionable recommendations. Results further indicate that users with prior experience using conversational AI demonstrated greater effectiveness in utilising these explanations compared to novices. Furthermore, this paper introduces a set of generic guidelines for creating augmented counterfactual explanations, incorporating safeguards to mitigate common LLM pitfalls, such as hallucinations, and ensuring the explanations are both actionable and contextually relevant for non-expert users.

Synthetic Voices: Evaluating the Fidelity of LLM-Generated Personas in Representing People’s Financial Wellbeing

Arshnoor Kaur
Amanda Aird
Harris Borman
Andrea Nicastro
Anna Leontjeva
Luiz Pizzato
Dan Jermyn

Large Language Models (LLMs) can impersonate the writing style of authors, characters, and groups of people, but can these personas represent their opinions? If so, it creates opportunities for businesses to obtain early feedback on ideas from a synthetic customer-base. In this paper, we test whether LLM synthetic personas can answer financial wellbeing questions similarly to the responses of a financial wellbeing survey of more than 3,500 Australians. We focus on identifying salient biases of 765 synthetic personas using four state-of-the-art LLMs built over 35 categories of personal attributes. We noticed clear biases related to age, and as more details were included in the personas, their responses increasingly diverged from the survey toward lower financial wellbeing. With these findings, it is possible to understand the areas in which creating synthetic LLM-based customer personas can yield useful feedback for faster product iteration in the financial services industry and potentially other industries.

Task-specific, personalized Automatic Speech Recognition

Fahrettin Gökgöz
Hussein Hasso

Voice User Interfaces (VUI) are particularly useful if the operator has to work hands-free or if her cognitive load is very high. This is the case, e.g., when the operator can be easily disturbed by the environment, the operational task induces stress and there is little or no fault tolerance. However, the factors that contribute to the usefulness of a VUI also complicate the design. Automatic Speech Recognition (ASR) must be robust in noisy environments, under non-optimal microphone conditions and for different types of speech – including stress-induced shouting, hyper articulation and heavy breathing, among others. Commercially available, generic ASR solutions do not fulfil high robustness requirements under these conditions. However, ASR systems can be made robust if they are tailored to their respective use cases and personalised for specific users. This paper introduces a method to customize a Large Vocabulary Continuous Speech Recognizer (LVCSR) system to achieve such robustness. An LVCSR system includes a language model (LM) and an acoustic model (AM). The customization involves adapting both the LM and the AM to the specific operational context. For LM customization, we employ an Use Case Editor (UCE) that provides an intuitive interface, enabling users to align linguistic models with their unique needs. For AM customization, a Multi Speaker Text to Speech Synthesis (MSTTS) module is used to automatically generate personalized speech data, ensuring the model captures the distinctive characteristics of individual speakers. Together, these adaptations ensure the LVCSR system is configured to meet the demands of challenging environments and diverse users.

The Effect of Nudging Techniques on the Customisation and Usability of Visual Analytics Dashboards

Hatim Alsayahani
Mohammed Alhamadi
Simon Harper
Markel Vigo

Visual analytics dashboards have become essential tools for decision-making. However, information overload and mismatches between designers’ expected graph literacy and users’ actual graph literacy can limit their effectiveness. Customisation has been proposed to mitigate these challenges and accommodate diverse user needs. Yet, customising dashboards is often time-consuming; users may not be aware of existing customisation features, or they may not have sufficient technical skills to use them. In this paper, we conduct an experiment (N=50) to examine if we can use nudging techniques to promote short-term surface customisations while not sacrificing the usability of interactive visual analytics dashboards. We found that while nudges do not necessarily increase the use of customisation functionalities, they benefit usability. Specifically, the Social Comparisons nudge supports decision-making, while the Just-in-Time Prompts nudge reduces task completion time. Our findings suggest that nudges should be tailored to graph literacy as users with moderate graph literacy can benefit the most from nudges.

The role of GPT as an adaptive technology in climate change journalism

Jia Hua Jeng
Gloria Kasangu
Alain Starke
Khadiga Mahmoud Abdalla Seddik
Christoph Trattner

Recent advancements in Large Language Models (LLMs), such as GPT-4o, have enabled automated content generation and adaptation, including summaries of news articles. To date, LLM use in a journalism context has been understudied, but can potentially address challenges of selective exposure and polarization by adapting content to end users. This study used a one-shot recommender platform to test whether LLM-generated news summaries were evaluated more positively than ‘standard’ 50-word news article previews. Moreover, using climate change news from the Washington Post, we also compared the influence of different ‘emotional reframing’ strategies to rewrite texts and their impact on the environmental behavioral intentions of end users. We used a 2 (between: Summary vs. 50-word previews) x 3 (within: fear, fear-hope or neutral reframing) research design. Participants (N = 300) were first asked to read news articles in our interface and to choose a preferred news article, while later performing an in-depth evaluation task on the usability (e.g., clarity) and trustworthiness of different framing strategies. The results showed that evaluations of summaries, while being positive, were not significantly better than those of previews. However, we did observe that a fear-hope reframing strategy of a news article, when paired with a GPT-generated summary, led to higher pro-environmental intentions compared to neutral framing. We discuss the potential benefits of this technology.

UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity

Yicheng Fu
Raviteja Anantha
Prabal Vashisht
Jianpeng Cheng
Etai Littwin

Generating user intent descriptions from a sequence of user interface (UI) snippets is a core challenge in comprehensive UI understanding and cross-modal text generation. Recent advancements in multi-modal large language models (MLLMs) have led to substantial progress in this area, but their demands for extensive model parameters, computing power, and high latency makes them impractical for scenarios requiring lightweight, on-device solutions with low latency or heightened privacy. Additionally, the lack of high-quality datasets has hindered the development of such lightweight models. To address these challenges, we propose UI-JEPA, a novel framework that employs masking strategies to learn abstract UI embeddings from unlabeled UI data through self-supervised learning, combined with an LLM decoder fine-tuned for user intent summarization. We also introduce two new UI-grounded multi-modal datasets, “Intent in the Wild” (IIW) and “Intent in the Tame” (IIT), designed for few-shot and zero-shot UI understanding tasks on mobile phones. IIW consists of 1.7K videos across 219 intent categories, while IIT contains ∼ 900 videos across 10 categories. We establish the first baselines for these datasets, showing that representations learned using a JEPA-style objective, combined with an LLM decoder, can achieve high-quality user intent summarization that match the performance of state-of-the-art large MLLMs, but with significantly reduced annotation and deployment resources. Measured by a scoring function that aggregates both n-gram overlap and embedding similarity, UI-JEPA outperforms GPT-4 Turbo and Claude 3.5 Sonnet by 10.0% and 7.2% respectively, averaged across the two datasets. Notably, UI-JEPA also accomplishes the performance with a 50.5x reduction in computational cost and a 6.6x improvement in latency in the IIW dataset. These results underscore the effectiveness of UI-JEPA, highlighting its potential for lightweight, high-performance UI understanding and intent summarization.

Uncertainty in Repeated Implicit Feedback as a Measure of Reliability

Bruno Sguerra
Viet-Anh Tran
Romain Hennequin
Manuel Moussallam

Recommender systems rely heavily on user feedback to learn effective user and item representations. Despite their widespread adoption, limited attention has been given to the uncertainty inherent in the feedback used to train these systems. Both implicit and explicit feedback are prone to noise due to the variability in human interactions, with implicit feedback being particularly challenging. In collaborative filtering, the reliability of interaction signals is critical, as these signals determine user and item similarities. Thus, deriving accurate confidence measures from implicit feedback is essential for ensuring the reliability of these signals.

A common assumption in academia and industry is that repeated interactions indicate stronger user interest, increasing confidence in preference estimates. However, in domains such as music streaming, repeated consumption can shift user preferences over time due to factors like satiation and exposure. While literature on repeated consumption acknowledges these dynamics, they are often overlooked when deriving confidence scores for implicit feedback.

This paper addresses this gap by focusing on music streaming, where repeated interactions are frequent and quantifiable. We analyze how repetition patterns intersect with key factors influencing user interest and develop methods to quantify the associated uncertainty. These uncertainty measures are then integrated as consistency metrics in a recommendation task. Our empirical results show that incorporating uncertainty into user preference models yields more accurate and relevant recommendations. Key contributions include a comprehensive analysis of uncertainty in repeated consumption patterns, the release of a novel dataset, and a Bayesian model for implicit listening feedback.

What Is Serendipity? An Interview Study to Conceptualize Experienced Serendipity in Recommender Systems

Brett Binst
Lien Michiels
Annelien Smets

Serendipity has been associated with numerous benefits in the context of recommender systems, e.g., increased user satisfaction and consumption of long-tail items. Despite this, serendipity in the context of recommender systems has thus far remained conceptually ambiguous. This conceptual ambiguity has led to inconsistent operationalizations between studies, making it difficult to compare and synthesize findings. In this paper, we conceptualize the user’s experience of serendipity. To this effect, we interviewed 17 participants and analyzed the data following the grounded theory paradigm. Based on these interviews, we conceptualize experienced serendipity as a user experience in which a user unintentionally encounters content that feels fortuitous, refreshing, and enriching. We find that all three components—fortuitous, refreshing and enriching—are necessary and together are sufficient to classify a user’s experience as serendipitous. However, these components can be satisfied through a variety of conditions. Our conceptualization unifies previous definitions of serendipity within a single framework, resolving inconsistencies by identifying distinct flavors of serendipity. It highlights underexposed flavors, offering new insights into how users experience serendipity in the context of recommender systems. By clarifying the components and conditions of experienced serendipity in recommender systems, this work can guide the design of recommender systems that stimulate experienced serendipity in their users, and lays the groundwork for developing a standardized operationalization of experienced serendipity in its many flavors, enabling more consistent and comparable evaluations.

With Friends Like These, Who Needs Explanations? Evaluating User Understanding of Group Recommendations

Cedric Waterschoot
Raciel Yera Toledo
Nava Tintarev
Francesco Barile

Group Recommender Systems (GRS) employing social choice-based aggregation strategies have previously been explored in terms of perceived consensus, fairness, and satisfaction. At the same time, the impact of textual explanations has been examined, but the results suggest a low effectiveness of these explanations. However, user understanding remains fairly unexplored, even if it can contribute positively to transparent GRS. This is particularly interesting to study in more complex or potentially unfair scenarios when user preferences diverge, such as in a minority scenario (where group members have similar preferences, except for a single member in a minority position). In this paper, we analyzed the impact of different types of explanations on user understanding of group recommendations. We present a randomized controlled trial (n = 271) using two between-subject factors: (i) the aggregation strategy (additive, least misery, and approval voting), and (ii) the modality of explanation (no explanation, textual explanation, or multimodal explanation). We measured both subjective (self-perceived by the user) and objective understanding (performance on model simulation, counterfactuals and error detection). In line with recent findings on explanations for machine learning models, our results indicate that more detailed explanations, whether textual or multimodal, did not increase subjective or objective understanding. However, we did find a significant effect of aggregation strategies on both subjective and objective understanding. These results imply that when constructing GRS, practitioners need to consider that the choice of aggregation strategy can influence the understanding of users. Post-hoc analysis also suggests that there is value in analyzing performance on different tasks, rather than through a single aggregated metric of understanding.

SESSION: Short Papers

Addressing Personalized Diversity in Eyewear Recommendation: a Lenskart Case Study

Lalit Kishore Vyas
Ludovico Boratto

This study addresses the challenge of limited diversity in recommender systems on e-commerce category pages, which often leads to reduced user engagement and satisfaction. Recognizing the limitations of traditional Factorization Machines (FM) in generating diverse recommendations, we propose a personalized diversity approach that combines re-ranking strategies with FM, enhanced by Generalist-Specialist (GS) scores to tailor diversity to individual user preferences. The re-ranking strategies explored include Maximal Marginal Relevance (MMR) and Determinantal Point Processes (DPP). Our results show improved balance between relevance and personalized diversity in offline experiments. Additionally, we investigate an alternative approach to personalized diversity through a contextual bandit model (LinUCB), where diversity emerges by balancing exploration and exploitation in predicted preferences. This evaluation highlights LinUCB’s ability to anticipate diverse recommendations by simulating adaptive responses without relying on active user feedback, offering a contrast to traditional re-ranking methods.

Bridging Preferences: Multi-Stakeholder Insights on Ideal News Recommendations

Thomas Elmar Kolb
Irina Nalis
Julia Neidhardt

In the evolving realm of recommender systems, our study contributes to the understanding of potential improvements in news recommendation beyond accuracy. Central to our research is the integration of insights from news industry experts and prospective readers, compared with automated news recommendations. We conducted a labeling study with 168 articles, using Best-Worst Scaling (BWS) for ranking and topic modeling. This approach enabled a thorough examination of stakeholder expectations for ideal reading recommendations, specifically by investigating the gap between stated and revealed preferences. Our findings show alignment in ranking behavior among journalists, prospective readers, and the BM-25 algorithm. However, preferences for different beyond-accuracy measures varied. Accompanying this work, a corpus of news articles and the labeled rankings have been made available.

Can Path-Based Explainable Recommendation Methods based on Knowledge Graphs Generalize for Personalized Education?

Neda Afreen
Giacomo Balloccu
Ludovico Boratto
Gianni Fenu
Francesca Maridina Malloci
Mirko Marras
Andrea Giovanni Martis

Knowledge graphs enable transparent reasoning in recommender systems. While widely studied in other domains, the generalizability of reasoning methods over knowledge graphs to education remains underexplored due to data and evaluation inconsistencies. In this paper, we investigate three classes of explainable reasoning methods for course recommendation. Comparing them with state-of-the-art baselines, we assess utility, beyond-utility, and explainability metrics. Our results show that methods from the generative class perform well in utility, coverage, and explanation diversity, while baselines are still competitive in some beyond-utility metrics under sparsity. With lower sparsity, the gap among methods decreases. Code: https://bit.ly/kg-reasoning-for-pers-course-recsys.

Circumventing Misinformation Controls: Assessing the Robustness of Intervention Strategies in Recommender Systems

Royal Pathak
Francesca Spezzano

Recommender systems are essential on social media platforms, shaping the order of information users encounter and facilitating news discovery. However, these systems can inadvertently contribute to the spread of misinformation by reinforcing algorithmic biases, fostering excessive personalization, creating filter bubbles, and amplifying false narratives. Recent studies have demonstrated that intervention strategies, such as Virality Circuit Breakers and accuracy nudges, can effectively mitigate misinformation when implemented on top of recommender systems. Despite this, existing literature has yet to explore the robustness of these interventions against circumvention—where individuals or groups intentionally evade or resist efforts to counter misinformation. This research aims to address this gap, examining how well these interventions hold up in the face of circumvention tactics. Our findings highlight that these intervention strategies are generally robust against misinformation circumvention threats when applied on top of recommender systems.

Effects of Quantitative Explanations on Fairness Perception in Group Recommender Systems

Patrik Dokoupil
Ladislav Peska

Group recommender systems (GRS) aim to deliver recommendations to groups of individuals, assisting them in planning activities such as going to the cinema with friends, organizing a family vacation, or dining out with colleagues. Unlike traditional recommender systems (RS), GRS must account for the preferences of multiple individuals, often balancing potentially conflicting goals. In this context, it is crucial to provide recommendations that are perceived as fair by all group members. While numerous aggregation strategies have been proposed, understanding users’ perspectives on fairness remains an open challenge. In this paper, we present the results of a user study in which real participants acted as external judges, evaluating the fairness of group recommendations. The study investigates the impact of quantitative explanations, conditioned by specific GRS and group types, on fairness perception. Our findings suggest that, without additional information, the task may be too difficult for users, and their ability to distinguish between different group types is limited, further underscoring the importance of explanations. Study data are available from https://osf.io/9fpyr/.

Enhancing Digital Narrative Medicine through Emotion Analysis in Conversational Agents

Mariagrazia Miccoli
Berardina Nadja De Carolis
Giuseppe Palestra
Aurora Toma

This paper presents the development of CArEN (Conversational AgEnt supporting Narrative medicine) that integrates a text-based emotional recognition module to personalize therapeutic pathways in the context of Narrative-Based Medicine (NBM). NBM combines traditional medicine, therapies, symptom monitoring, and vital parameters detection with a conversation-based approach that allows considering not only physical well-being but also the psychosocial and emotional impact of illness on the patient’s life. A study was carried out to evaluate the models’ effectiveness in real-world contexts and collect user feedback on the conversational agent’s performance and empathic support. The results demonstrate good accuracy in emotion recognition and positive user feedback, highlighting the conversational agent’s potential as an effective means of supporting narrative medicine techniques.

Exploring Persuasive Engagement to Reduce Over-Reliance on AI-Assistance in a Customer Classification Case

Muhammad Raees
Vassilis-Javed Khan
Konstantinos Papangelis

Users often over-rely on AI-assisted decisions without analytically engaging with them, even in practical domains. In this work, we explore persuading users to analytically engage with AI assistance to reduce their over-reliance using a complex business case of customer classification. We explore the effect of persuasive cognitive engagement through explanations and communicating system uncertainty to examine the behavior of participants having diverse expertise. We leverage their feedback and objective behavior to understand their perception of the AI performance. Our findings show a contrast in participants’ subjective and objective behavior, indicating inappropriate reliance on AI assistance with the perception of system performance. However, we observe the positives of interactive cognitive engagement and identify further directions to get deeper insights into expert domains with personalized AI assistance and behavioral persuasion.

GNN’s FAME: Fairness-Aware MEssages for Graph Neural Networks

Erasmo Purificato
Hannan Javed Mahadik
Ludovico Boratto
Ernesto William De Luca

Graph Neural Networks (GNNs) have shown success in various domains but often inherit societal biases from training data, limiting their real-world applications. Historical data can contain patterns of discrimination related to sensitive attributes like age or gender. GNNs can even amplify these biases due to their topology and message-passing mechanism, where nodes with similar sensitive attributes tend to connect more frequently. While many studies have addressed algorithmic fairness in machine learning through pre-processing and post-processing techniques, few have focused on bias mitigation within the GNN training process.

In this paper, we propose FAME (Fairness-Aware MEssages), an in-processing bias mitigation technique that modifies the GNN training’s message-passing algorithm to promote fairness. By incorporating a bias correction term, the FAME layer adjusts messages based on the difference between the sensitive attributes of connected nodes. FAME is compatible with Graph Convolutional Networks, and a variant called A-FAME is designed for attention-based GNNs. Experiments conducted on three datasets evaluate the effectiveness of our approach against three classes of algorithms and six models, considering two notions of algorithmic fairness. Results show that the proposed approaches produce accurate and fair node classifications. These results provide a strong foundation for further exploration and validation of this methodology. The source code is available at https://github.com/HannanJaved/FAME.

Integrating Expert Knowledge With Automated Knowledge Component Extraction for Student Modeling

Rafaella Sampaio de Alencar
Mehmet Arif Demirtas
Adittya Soukarjya Saha
Yang Shi
Peter Brusilovsky

Knowledge tracing is a method to model students’ knowledge and enable personalized education in many STEM disciplines such as mathematics and physics, but has so far still been a challenging task in computing disciplines. One key obstacle to successful knowledge tracing in computing education lies in the accurate extraction of knowledge components (KCs), since multiple intertwined KCs are practiced at the same time for programming problems. In this paper, we address the limitations of current methods and explore a hybrid approach for KC extraction, which combines automated code parsing with an expert-built ontology. We use an introductory (CS1) Java benchmark dataset to compare its KC extraction performance with the traditional extraction methods using a state-of-the-art evaluation approach based on learning curves. Our preliminary results show considerable improvement over traditional methods of student modeling. The results indicate the opportunity to improve automated KC extraction in CS education by incorporating expert knowledge into the process.

Learning User Interface Preferences via Contextual Discrete Choice Experimentation

William Fisher
Jacob Rhyne
Mark Bailey
Joseph Morgan
Ryan Lekivetz

Designing effective user interfaces (UIs) is a complex decision-making process that often relies on usability testing and understanding users’ preferences. However, user preferences can vary widely based on contextual information (such as age, nationality, or use-case of the software system), posing a significant challenge in creating universally effective studies. To address this, we propose a novel framework of contextual discrete choice experimentation (DCE) to learn the relationship between contextual information and user preferences, enabling the creation of more statistically efficient studies for new cohorts of participants. In this framework, users are presented with a sequence of questions where they choose their preferred option between two or more design alternatives. This preference data, combined with contextual information, is used to develop a statistical model that recommends UI designs to new or existing users. We detail the methodology for designing contextual DCEs and demonstrate its application with a real-world example involving users of a statistical software system. Our results indicate that the contextual DCE framework effectively captures user preferences and provides personalized UI recommendations.

Leveraging LLMs to Explain the Consequences of Recommendations

Sebastian Lubos
Michael Gartner
Alexander Felfernig
Reinhard Willfort

Recommender systems help users make better decisions by suggesting products that match their preferences. However, users often do not understand why certain products are recommended, which can reduce trust and satisfaction. While explanations address this issue, they often fail to communicate the individual impact the decision for an item will have. To address this, we present an LLM-based framework for generating consequence-based explanations. These explanations provide comprehensible personalized insights into the positive and negative consequences of user decisions. To support the assessment and selection of the most effective prompting strategy, we introduce evaluation metrics tailored to consequence-aware explanations and systematically compare different prompting strategies for an apartment recommendation example.

LLMs and Emotional Intelligence: Evaluating Emotional Understanding Through Psychometric Tools

Dhairya Dalal
Gaurav Negi
Davide Picca

This study investigates the Emotional Intelligence (EI) of LLMs by evaluating their ability to replicate human-like emotional reasoning and self-assessment using established psychometric tools. Prior research has utilized standardized tests to evaluate LLMs in unambiguous emotional reasoning contexts, where a single correct answer is expected. However, emotional situations are rarely simple, and subjective evaluation can often produce equally valid alternative interpretations. This preliminary study proposes to investigate LLM EI using psychometric tests developed for humans, where the emotional reasoning contexts are more complex, subject to multiple interpretations, and often subjective. Four mainstream LLMs (GPT-4o, Mixtral, Gemma-2, and Llama-3.1) are evaluated using two well-established EI assessments: the Trait Emotional Intelligence Questionnaire (TEIQUE) and the Situational Test of Emotional Understanding (STEU). TEIQUE assesses self-perceived emotional capabilities across the facets of well-being, self-control, emotionality, and sociability, while STEU evaluates situational emotional reasoning in real-world contexts. The experiments reveal distinctive self-awareness traits in LLMs, varying levels of safety alignment, and their ability to interpret emotional situations. A human study is conducted to evaluate the reasonableness of LLM emotional appraisals that deviate from expected responses. The results highlight the potential of LLMs to act as tools for emotional interpretation, transcending the deterministic outputs of traditional NLP systems. Finally, this study concludes by discussing the need for non-deterministic and more sophisticated EI assessments that better align with human EI.

Mitigating Risks in Marketplace Semantic Search: A Dataset for Harmful and Sensitive Query Alignment

Filip Spacek
Vojtech Vancura
Pavel Kordik

Semantic search engines have transformed user interaction with online marketplaces, creating a need for effective methods to moderate harmful and sensitive content. Existing approaches often struggle with ambiguous query intent, content classification challenges, and noisy data, making it difficult to ensure user safety while maintaining relevance in search results. To address these challenges, we introduce SHIELD, a synthetic dataset designed for classifying user queries into harmful, sensitive, and normal categories. SHIELD is generated using a large language model with a structured taxonomy, followed by automated filtering using a reward model to ensure data quality and relevance. To demonstrate SHIELD’s utility, we evaluate three classification approaches: (1) BM25, a computationally efficient retrieval-based method; (2) a sentence transformer with FAISS, which improves classification by leveraging semantic embeddings; and (3) MoralBERT, a fine-tuned transformer model trained on SHIELD for direct query classification. We discuss the trade-offs among these methods in terms of accuracy, resource requirements, and explainability, highlighting their applicability in real-world semantic search systems. This work provides a foundation for developing AI-driven content moderation systems in semantic search, offering insights into the trade-offs between efficiency, accuracy, and explainability. The SHIELD dataset, pre-trained model, and generation details are publicly available to support future research and real-world deployment: https://github.com/flpspacek/SHIELD.

Warning: This paper contains examples of language and content that some readers may find offensive. These examples are included solely to illustrate challenges in content moderation and to highlight the importance of ethical considerations in semantic search systems.

"Strangers in a new culture see only what they know": Evaluating Effectiveness of GPT-4 Omni for Detecting Cross-Cultural Communication Norm Violations

Tzu-Yu Weng
Hanna Alzughbi
Isaac Rabago
Erin Arévalo Chaves
Erik Vagil
Nancy Fulda
Erin Ash
Mainack Mondal
Bart Knijnenburg
Xinru Page

Cross-cultural communication often results in misaligned norms and expectations, leading to misunderstandings or harm. As the internet increasingly facilitates cross-cultural communication online, such misalignments also increase. However, there is an opportunity to use Large Language Models (LLMs) to detect such misunderstandings and assist in addressing them. To that end, this study investigates whether cross-cultural norm violations can be detected and mitigated using popular LLMs. Using a set of carefully constructed cross-cultural communication scenarios, half of which present norm violations, we test the ability of OpenAI’s GPT-4 Omni (GPT-4o) model to identify cross-cultural communication norm violations. We find that GPT-4o classification accuracy varies by the stated age, gender, and nationality of the communicators described in the scenarios, suggesting a lack of fairness and a potential cultural gap in GPT-4o’s detection.

Training Green and Sustainable Recommendation Models: Introducing Carbon Footprint Data into Early Stopping Criteria

Giuseppe Spillo
Allegra De Filippo
Emanuele Fontana
Michela Milano
Giovanni Semeraro

With the growing focus on Green AI, there is an urgent need for algorithms that are designed to minimize their environmental impact while maintaining satisfying performance. In this paper, we introduce a novel early stopping strategy that considers carbon footprint data while training a recommendation algorithm. In particular, during the training phase, our criterion epoch-by-epoch analyzes the improvement in terms of predictive accuracy and compares it to the increase in carbon emissions. Then, we analyze the trade-off between the scores, and when the accuracy improves at a rate that is not favorable, the training is stopped.

In the experimental evaluation, we showed that our strategy could significantly reduce the carbon footprint of several state-of-the-art recommendation models, with a limited decrease in accuracy and fairness. While more work is needed to automatically balance the trade-off between accuracy and emissions, this paper sheds light on the need for more sustainable recommendation models and takes a significant step toward designing green training strategies.

Unveiling Creativity in Student Code: A Gaussian Mixture Model Approach

Veronika Bogina
Arnon Hershkovitz
Noam Koenigstein

Creativity, characterized by the capacity to generate novel and valuable ideas or solutions through imaginative thinking and unique problem-solving, differs widely between individuals. Despite its importance, this variability is often overlooked in research on personalization in education. In this study, our goal is to personalize creativity within a programming learning platform for school students. Leveraging a unique dataset of students’ initial coding attempts, we employ a Gaussian Mixture Model to identify distinct creativity profiles among learners. By integrating these insights into user modeling, this work lays the foundation for developing personalized programming curricula tailored to each student’s creative strengths, highlighting the potential of creativity-aware adaptive systems in education. We make our data and code publicly available at: https://github.com/sveron/Creativity.

Why Context Matters: Exploring How Musical Context Impacts User Behavior, Mood, and Musical Preferences

Anna Hausberger
Emilia Parada-Cabaleiro
Markus Schedl

Music consumption is shaped by both internal factors (e.g., mood, motivation) and external factors (e.g., activity, social environment), which together influence listeners’ behavior (e.g., focus, songs’ skips) and reactions (e.g., mood changes). While prior research has explored real-life or survey-based, context-aware music listening with limited available context information, we introduce a dataset comprising 216 music listening sessions collected in real-world settings through a custom-built Android mobile application designed to assess a wide range of contextual factors. The dataset captures static (e.g., activity, social environment, motivation) and dynamic (e.g., mood changes) contextual factors, along with music interaction data (e.g., skipped or fully listened songs), listening focus levels, and participant traits (e.g., demographics, music education, listening preferences, personality).

Our analysis highlights key insights into how different contextual factors influence user behavior and mood. demonstrating significant differences in skipping songs, focus levels, and genre diversity. We show that music listening sessions grouped by context are significantly different in terms of music listening behaviors (focus, skipping, and session genre diversity) and mood changes (happiness, sadness, stress, and energy). Furthermore, we explore the correlations between personality traits and listening behaviors (mean skip rate and genre diversity). Ultimately, our findings emphasize the importance of understanding context, as different situations lead to distinct music preferences and have varying impacts on user behavior and emotional responses.

SESSION: Industry Papers

Adaptive User Modeling in Visual Merchandising: Balancing Brand Identity with Operational Efficiency

Potito Aghilar
Vito Walter Anelli
Andrea Lops
Fedelucio Narducci
Azzurra Ragone
Sabino Roccotelli
Michelantonio Trizio

Maintaining a consistent brand identity across a global network of retail stores while adhering to local constraints has long challenged Visual Merchandisers. Legacy processes, often reliant on subjective “by-eye” adjustments, can drive up operational costs and lead to inconsistent in-store execution. We formalize a user modeling framework implementing a multi-criteria utility function that balances brand identity and operational overhead. We integrated our framework in a 3D virtual tour design platform, deploying it in the ecosystem of OVS, a global fashion firm. Through a preliminary user study, we showcase that our solution enables lower iteration cycles and decreases store-to-store discrepancies.

Enhancing Personalisation in Fantasy Sports with Graph-Based Representations

Anupam Agrawal
Jil Kothari
Siddhesh Chaubal
Palash Tatte

In this paper, we propose a technique that leverages graph-based representation learning using the GraphSAGE algorithm to furnish diverse personalized communications tailored to each user’s unique engagement patterns within the fantasy sports ecosystem. By curating such personalized user suggestions, we promote diverse user engagement that is more customer centric while maintaining business metrics. We perform offline and online experiments to evaluate the effectiveness of our approaches concerning their impact on different user engagement and business metrics.

Finding Interest Needle in Popularity Haystack: Improving Retrieval by Modeling Item Exposure

Rahul Agarwal
Amit Jaspal
Saurabh Gupta
Omkar Vichare

Recommender systems operate in closed feedback loops, where user interactions reinforce popularity bias, leading to over-recommendation of already popular items while under-exposing niche or novel content. Existing bias mitigation methods, such as Inverse Propensity Scoring (IPS) and Off-Policy Correction (OPC), primarily operate at the ranking stage or during training, lacking explicit real-time control over exposure dynamics. In this work, we introduce an exposure-aware retrieval scoring approach, which explicitly models item exposure probability and adjusts retrieval-stage ranking at inference time. Unlike prior work, this method decouples exposure effects from engagement likelihood, enabling controlled trade-offs between fairness and engagement in large-scale recommendation platforms. We validate our approach through online A/B experiments in a real-world video recommendation system, demonstrating a 25% increase in uniquely retrieved items and a 40% reduction in the dominance of over-popular content, all while maintaining overall user engagement levels. Our results establish a scalable, deployable solution for mitigating popularity bias at the retrieval stage, offering a new paradigm for bias-aware personalization.

Personalized Fashion Advertising with Large Language Models: A Case Study on Fine-Tuning for Marketing Copy Generation

Andrea Lops
Fedelucio Narducci
Azzurra Ragone
Michelantonio Trizio

The rapid digitalization of the fashion industry has transformed marketing strategies, emphasizing the need for personalized and adaptive advertising content. This paper presents a case study on fine-tuning Large Language Models (LLMs) for fashion advertising, focusing on OVS, a major Italian fashion retailer. By leveraging real-world marketing data from OVS’s newsletters and social media campaigns, we developed a fine-tuned model capable of generating engaging and stylistically coherent promotional content. To evaluate the effectiveness of this approach, we introduced a novel brand compliance index, measuring the alignment of AI-generated text with key branding requirements, such as audience targeting, event specificity, and platform appropriateness. Experimental results show that the fine-tuned model achieved a compliance score of 0.82, significantly outperforming the baseline model (0.63). Although this approach introduces a minor increase in generation latency, the enhanced alignment with brand identity justifies its use in marketing automation. Our findings highlight the potential of fine-tuned LLMs to streamline advertising content generation while maintaining brand consistency, offering valuable insights for the future of AI-driven digital marketing.

Predicting Movie Hits Before They Happen with LLMs

Shaghayegh Agah
Yejin Kim
Neeraj Sharma
Mayur Nankani
Kevin Foley
H. Howie Huang
Sardar Hamidian

Addressing the cold-start issue in content recommendation remains a critical ongoing challenge. In this work, we focus on tackling the cold-start problem for movies on a large entertainment platform. Our primary goal is to forecast the popularity of cold-start movies using Large Language Models (LLMs) leveraging movie metadata. This method could be integrated into retrieval systems within the personalization pipeline or could be adopted as a tool for editorial teams to ensure fair promotion of potentially overlooked movies that may be missed by traditional or algorithmic solutions. Our study validates the effectiveness of this approach compared to established baselines and those we developed.

SESSION: Doctoral Consortium Papers

AI-Assisted Learning

Mubina Kamberovic

Introductory programming courses present significant challenges for novice learners, often leading to frustration and difficulty in identifying learning gaps. This research aims to develop an AI-driven tool that provides personalized guidance, moving beyond traditional "one-size-fits-all" approaches. Recognizing the limitations of relying solely on digital interaction logs in the era of generative AI, we explore the integration of student personal characteristics and fine-grained programming interactions to predict learning behavior and performance. We will investigate how to accurately predict student outcomes early in the semester, analyze the dynamics of learning behaviors, and design an AI-assisted tool to recommend tailored learning materials and feedback. Our goal is to foster effective learning and mitigate the risks associated with over-reliance on general-purpose AI, ultimately enhancing knowledge retention and problem-solving skills.

Cognitive-Emotional Modeling and Hybrid Intelligence: A User-Centered Approach to Psychomotor Interventions in Active Aging

Dayris Rapado

Aging presents challenges to society. Scientific research suggests that regular physical activity and cognitive stimulation are essential for an active aging process. In this context, this research proposes a novel approach that combines hybrid intelligence and neuroscience to model the cognitive functions and emotional state of people to generate psychomotor interventions to promote active aging.

Enabling Novices to Diagnose Robot Failures by Aligning Users' Mental Models of Robots

Gregory LeMasurier

As robots continue to be adopted into our everyday lives they may encounter unforeseen circumstances, resulting in failures that require assistance from nearby people. When people enter interactions with robots, they leverage their mental models of the system and its functions. These mental models are based on a person’s knowledge of and experiences with that robot and others. For this reason, the models are often incomplete or inaccurate, resulting in inefficient interactions. Understanding a complex robot and its functions is difficult, especially for novices. Therefore, when robots require assistance it is necessary for them to explain their failures in a manner that not only provides enough context for a person to resolve the error, but that also helps correct people’s misaligned mental models. Through this work, I aim to enable non-experts to more efficiently and effectively diagnose and resolve robot failures.

Integrating Indoor Positioning, Recommendation, and Personalization to Enhance Museum Visitor Experiences

Alessio Ferrato

Personalization in Cultural Heritage (CH) settings is crucial for transforming visitor experiences into meaningful interactions accommodating diverse expectations and preferences. This research presents a holistic framework to enhance visitor experiences in CH physical institutions, like Galleries, Libraries, Archives, and Museums, through the combination of Indoor Positioning Systems (IPS), Recommender Systems, and Large Language Models (LLMs). Our Bluetooth beacon-based IPS implementation has been successfully deployed in a major gallery in Rome. The system covers 17 rooms and over 100 artworks and provides the user’s position with high accuracy. We conceptualized a recommendation algorithm to optimize visitor engagement by progressively increasing mean dwell time while considering spatial and temporal constraints. Moreover, our experiments with LLM-generated audioguides demonstrate that visitors prefer content tailored to established visitor categories, validating our approach to personalization. These findings provide empirical support for personalized digital interpretation in GLAM contexts, though challenges remain regarding IPS precision over time and LLM hallucination mitigation. Future work will focus on collecting visitor interaction data, implementing the recommender system, and potentially releasing datasets to address the scarcity of CH-specific positioning and recommendation data.

Investigating Speech and Multimedia Integration in Assistive Robots

Massimo Donini

This doctoral research investigates the integration of speech and multimedia elements in Human-Robot Interaction to enhance communication between humans and assistive robots. The study aims to define, design, implement, and validate adaptive interaction models that can facilitate social engagement, cognitive stimulation, and emotional support through personalized and dynamic interactions. A key objective is to improve natural language dialogue by adapting robot communication to users’ preferences and emotions, dynamically extracted during interaction. This personalization seeks to make the “social moment” more effective, tailoring dialogue strategies to the unique characteristics of individuals.

Just a Chill Robot. Strategies for relatable and personalized Assistive Robots for Autistic Children

Linda Pigureddu

This research explores innovative approaches to improve Robotic Therapeutic Assistant for autistic children¹. It aims to design a social robot that is believable as a peer for the children, implementing a relatable personality, youth-inspired communication style, and interests reflecting the children’s. Robot-Assisted Therapies (RT) proved to be engaging and produced better outcomes than traditional ones with autistic children. Still, there is a need for improved customisation of robot behaviour and communication, as well as greater inclusivity in the field, characterised by the unbalanced participation of boys interested in robotics. This research welcomes the literature’s hints on the possibility of these issues being addressed by applying a user-centred approach and leveraging a dynamic user model (UM) and offers proposals to customise the robot’s behaviour, making it a more versatile tool for therapists. As part of the "FeelGood!" project², this study benefits from the expertise of a multidisciplinary team, contributing with perspectives of different neurotypes. The co-design process, with the innovative idea to include both autistic and allistic children, counts to comprise every involved group’s feedback on designing engaging, relatable and therapeutically meaningful interactions.

Segment, Recommend, and Explain: Advancing Conversational Recommender Systems with Large Language Model Agents

Fillipe dos Santos Silva

From personalized shopping assistants to streaming service recommendations, Conversational Recommender Systems (CRSs) have become essential tools for decision-making. However, users often struggle to understand why particular items are recommended, leading to reduced trust, lower engagement, and less effective decision-making. At the same time, CRSs usually face challenges in recommendation accuracy as they struggle to integrate historical data, real-time feedback, and contextual signals effectively. This combination of unclear explanations and suboptimal recommendations diminishes user experience and system reliability. To address these challenges, we explore three key questions: (i) how the integration of structured and unstructured data can enhance downstream tasks in a Multi-Agent System (MAS)-based CRS architecture, (ii) how a MAS can generate dynamic and user-centric explanations, and (iii) how cross-agent collaboration can optimize recommendation accuracy. MAS-based architectures distribute critical tasks such as decision-making, data retrieval, optimization, and reasoning across specialized agents. A meta-agent oversees these agents, ensuring coordination and adaptability. This research aims to enhance the clarity of explanations and improve recommendation accuracy by integrating modular agents in a coordinated MAS framework, providing more personalized explanations and adapting to evolving user preferences.

Supporting User Information Processing Through Large Language Models Within the Political Sphere

Neeley Pate

How do we support information within the political domain? By incorporating personalization and guardrails, large language model (LLM) systems can be leveraged to support navigation through the information ecosystem. In this work, I outline a proposal for designing LLM systems within two areas: mitigating misinformation belief and bolstering information processing. These tools aim to draw from theories within persuasion, information processing, and motivated reasoning to ultimately speak to the end user and nudge them to pursue accuracy when presented with information. These interventions will not only extend research within these relevant domains, but also support an individual’s ability to interpret information provided.

Teaming in the AI Era: AI-Augmented Frameworks for Forming, Simulating, and Optimizing Human Teams

Mohammed Almutairi

Effective teamwork is essential across diverse domains. During the team formation stage, a key challenge is forming teams that effectively balance user preferences with task objectives to enhance overall team satisfaction. In the team performing stage, maintaining cohesion and engagement is critical for sustaining high team performance. However, existing computational tools and algorithms for team optimization often rely on static data inputs, narrow algorithmic objectives, or solutions tailored for specific contexts, failing to account for the dynamic interplay of team members’ personalities, evolving goals, and changing individual preferences. Therefore, teams may encounter member dissatisfaction, as purely algorithmic assignments can reduce members’ commitment to team goals or experience suboptimal engagement due to the absence of timely, personalized guidance to help members adjust their behaviors and interactions as team dynamics evolve. Ultimately, these challenges can lead to reduced overall team performance.

Driven by these challenges, my Ph.D. dissertation aims to develop AI-augmented team optimization frameworks and practical systems that enhance team satisfaction, engagement, and performance. First, I propose a team formation framework that leverages a multi-armed bandit algorithm to iteratively refine team composition based on user preferences, ensuring alignment between individual needs and collective team goals to enhance team satisfaction. Second, I introduce tAIfa (“Team AI Feedback Assistant”), an AI-powered system that utilizes large language models (LLMs) to deliver immediate, personalized feedback to both teams and individual members, enhancing cohesion and engagement. Finally, I present PuppeteerLLM, an LLM-based simulation framework that simulates multi-agent teams to model complex team dynamics within realistic environments, incorporating task-driven collaboration and long-term coordination. My work takes a human-centered approach to advance AI-driven team optimization through both theoretical frameworks and practical systems to improve team members’ satisfaction, engagement, and performance.

Towards Intelligent VR Training: A Physiological Adaptation Framework for Cognitive Load and Stress Detection

Mahsa Nasri

Adaptive Virtual Reality (VR) systems have the potential to enhance training and learning experiences by dynamically responding to users’ cognitive states. This research investigates how eye tracking and heart rate variability (HRV) can be used to detect cognitive load and stress in VR environments, enabling real-time adaptation. The study follows a three-phase approach: (1) conducting a user study with the Stroop task to label cognitive load data and train machine learning models to detect high cognitive load, (2) fine-tuning these models with new users and integrating them into an adaptive VR system that dynamically adjusts training difficulty based on physiological signals, and (3) developing a privacy-aware approach to detect high cognitive load and compare this with the adaptive VR in Phase two. This research contributes to affective computing and adaptive VR using physiological sensing, with applications in education, training, and healthcare. Future work will explore scalability, real-time inference optimization, and ethical considerations in physiological adaptive VR.

Towards Personalized Physiotherapy via Common Semantic Fusion: Multi-Modal Learning, Computer Vision and Empathetic NLP

Victor Garcia

This research develops an AI-driven framework for personalized physiotherapy by integrating multi-modal learning, computer vision, and empathetic NLP. It focuses on user modeling and personalization to enhance physiotherapy assessments via an optimized YOLO Pose algorithm, fusing visual, auditory, and textual data for comprehensive mobility evaluation. Preliminary results show improved pose estimation, supporting the potential for clinical validation and integration of additional modalities such as inertial sensors.

TOC Adjunct Proceedings

The ACM OpenTOC service enables visitors to download the articles below at no charge.

Open Adjunct Proceedings

UMAP Adjunct '25: Adjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization

Full Citation in the ACM Digital Library

SESSION: Tutorials

Conducting Recommender Systems User Studies Using POPROX

Robin Burke
Michael Ekstrand

The Platform for OPen Recommendation and Online eXperimentation (POPROX) is a new resource to allow recommender systems and personalization researchers to conduct online user research without having to develop all of the necessary infrastructure and recruit users themselves. Our first domain is personalized news recommendations: POPROX 1.0 provides a daily newsletter (with content from the Associated Press) to users who have already consented to participate in research, along with interfaces and protocols to support researchers in conducting studies that assign subsets of users to various experimental algorithms and/or interfaces.

The purpose of this tutorial is to introduce the platform and its capabilities to researchers in the UMAP community who may be interested using the system. Participants will walk through the implementation of a sample experiment to demonstrate the mechanics of designing and running user studies with POPROX.

Conducting User Experiments with Personalized Systems

Bart P. Knijnenburg

This tutorial provides practical training in designing and conducting online user experiments with personalized systems, and in statistically analyzing the results of such experiments. This tutorial will be useful for anyone seeking to conduct user-centric evaluations of UMAP systems—ranging from adaptive user interfaces to human-like conversational artificial intelligence (AI) systems powered by large language models (LLMs). It covers the development of a research question and hypotheses, the selection of study participants, the manipulation of system aspects and measurement of behaviors, perceptions and user experiences, and the evaluation of subjective measurement scales and study hypotheses. Slides will be made available at https://www.usabart.nl/QRMS/.

Data Access under the EU Digital Services Act and its Impact on User Modelling Research

Erasmo Purificato
Ludovico Boratto
João Vinagre

The Digital Services Act (DSA) establishes a regulatory framework for online platforms and search engines in the European Union, focusing on mitigating systemic risks such as illegal content dissemination, fundamental rights violations, and impacts on electoral processes, public health, and gender-based violence. Very Large Online Platforms (VLOPs) and Very Large Search Engines (VLOSEs), defined as those with over 45 million active recipients, must provide data access for research to enable investigations into these risks and the development of solutions. This tutorial is tailored for the UMAP community, addressing the implications of the DSA for user modelling research. It will cover the DSA’s key provisions and definitions, outline the procedural steps for accessing VLOP and VLOSE data, and discuss the technical aspects of data access requests. Participants will also explore the challenges and opportunities involved in working with this data. By the end of the tutorial, attendees will have a thorough understanding of the DSA’s data access provisions, the technical and procedural requirements for accessing VLOP and VLOSE data, and the regulation’s implications for user modelling research. They will be equipped to navigate the complexities of the DSA and contribute to the development of responsible and transparent online platforms.

Further information and resources about the tutorial are available on the website: https://erasmopurif.com/tutorial-dsa-umap25/.

Designing Intelligent User Interfaces for Well-Being (Tutorial)

Ernesto William De Luca
Julian Marvin Joers
Marko Tkalcic

Well-being as a paradigm shift in human-computer interaction (HCI) becomes increasingly important, especially due to the increase of proactive interfaces and artificial intelligence use. Beyond the usability and simplicity of interaction, HCI research is confronted with long-term perspectives that demand alternative doctrines, especially concerning the well-being of humans and societies. In this tutorial, three separate research areas will be joining to create a comprehensive tutorial for considering well-being orientations in developing intelligent user interfaces. Firstly, the development of intelligent user interfaces will be highlighted from an artificial intelligence (AI) engineering perspective using modality-based thinking and algorithmic evaluation to build well-being-centered systems. Secondly, this perspective will be extended by the human-centered artificial intelligence (HCAI) approach, and lastly, how the actual interaction from a human-centered design perspective can be developed regarding well-being orientations, manifesting specific interaction principles.

Human-Centered and Sustainable Recommender Systems

Allegra De Filippo
Ludovico Boratto
Giuseppe Spillo

This tutorial explores the intersection of sustainability and recommender systems, focusing on aligning user needs and values with sustainable practices. It emphasizes two dimensions: (1) understanding and modeling users to deliver more sustainable recommendations; and (2) fostering sustainability through system design and functionality. Participants will learn how recommender systems can encourage sustainable behaviors and how to enhance system efficiency while minimizing resource consumption and ethical challenges. Through theoretical insights and hands-on sessions, this tutorial proposes discussion and actionable strategies to design human-centered, sustainable recommender systems, addressing both societal impact and technological responsibility.

SESSION: Late-Breaking Results and Demo Papers

Beyond Demographics: Evaluating News Recommender Systems Fairness Through Behavioural Communities

Bernhard Steindl
Thomas Elmar Kolb
Julia Neidhardt

Fairness in recommender systems is often framed around demographic attributes. In this work, we explore a novel direction—evaluating fairness across latent behavioural communities derived from user interactions on a real-world news platform. Using graph-based community detection (Louvain and Infomap), we identify large user groups and examine how different network modelling choices affect fairness outcomes in both traditional and fairness-aware recommender systems. Experiments on an Austrian news dataset reveal that small changes in graph construction considerably impact community formation and recommendation quality. Notably, fairness-aware algorithms show only marginal improvements over standard approaches, underscoring the complexity of achieving equitable outcomes in real-world systems and raising important questions for future research.

BIRD: A Museum Open Dataset Combining Behavior Patterns and Identity Types to Better Model Visitors' Experience

Worm Alexanne
Florian Marchal
Sylvain Castagnos

Lack of data is a recurring problem in Artificial Intelligence, as it is essential for training and validating models. This is particularly true in the field of cultural heritage, where the number of open datasets is relatively limited and where the data collected does not always allow for holistic modeling of visitors’ experience due to the fact that data are ad hoc (i.e. restricted to the sole characteristics required for the evaluation of a specific model). To overcome this lack, we conducted a study between February and March 2019 aimed at obtaining comprehensive and detailed information about visitors, their visit experience and their feedback. We equipped 51 participants with eye-tracking glasses, leaving them free to explore the 3 floors of the museum for an average of 57 minutes, and to discover an exhibition of more than 400 artworks. On this basis, we built an open dataset combining contextual data (demographic data, preferences, visiting habits, motivations, social context...), behavioral data (spatiotemporal trajectories, gaze data) and feedback (satisfaction, fatigue, liked artworks, verbatim...). Our analysis made it possible to re-enact visitor identities combining the majority of characteristics found in the literature [3, 8, 9, 10, 16, 19] and to reproduce the Veron and Levasseur profiles [17]. This dataset will ultimately make it possible to improve the quality of recommended paths in museums by personalizing the number of points of interest (POIs), the time spent at these different POIs, and the amount of information to be provided to each visitor based on their level of interest. Dataset URL: https://mbanv2.loria.fr/

Conceptual Framework for Group Dynamics Modeling from Group Chat Interactions

Esma Karahodža
Amra Delić
Francesco Ricci

Group Recommender Systems aim to support groups in making collective decisions, and research has consistently shown that the more we understand about group members and their interactions, the better support such systems can provide. In this work, we propose a conceptual framework for modeling group dynamics from group chat interactions, with a particular focus on decision-making scenarios. The framework is designed to support the development of intelligent agents that provide advanced forms of decision support to groups. It consists of modular, loosely coupled components that process and analyze textual and multimedia content, which is shared in group interactions, to extract user preferences, emotional states, interpersonal relationships, and behavioral patterns. By incorporating sentiment analysis, summarization, dialogue state tracking, and conflict resolution profiling, the framework captures both individual and collective aspects of group behavior. Unlike existing approaches, our model is intended to operate dynamically and adaptively during live group interactions, offering a novel foundation for group recommender and decision support systems.

Content-Based or Collaborative? Insights from Inter-List Similarity Analysis of ChatGPT Recommendations

Dario Di Palma
Giovanni Maria Biancofiore
Vito Walter Anelli
Fedelucio Narducci
Tommaso Di Noia

ChatGPT has demonstrated remarkable versatility across various domains, including Recommender Systems (RSs). Unlike traditional RSs, ChatGPT generates recommendations through natural language, leveraging contextual cues and large-scale knowledge representations. However, it remains unclear whether these recommendations implicitly encode collaborative patterns, rely on semantic item similarities, or follow a fundamentally different paradigm. In this work, we systematically analyze ChatGPT’s recommendation behavior by comparing its generated lists to collaborative and content-based filtering baselines across three domains: Books, Movies, and Music. Using established list similarity metrics, we quantify the alignment of ChatGPT’s recommendations with traditional paradigms. Additionally, we investigate the most recommended items by ChatGPT and the other recommenders, comparing the distribution of frequently recommended items across models. Our findings reveal that ChatGPT exhibits strong similarities to collaborative filtering (CF) and amplifies popular yet underrepresented items in the dataset, suggesting a broader domain knowledge encoded in the language model and the need for future research on leveraging LLMs for recommendation tasks.

Enhancing News Recommendation Systems Using Multitask Learning

Paolo Masciullo
Andrea Pazienza
Claudio Pomo
Fedelucio Narducci
Tommaso Di Noia

Multitask learning (MTL) has emerged as a promising paradigm for improving recommendation systems by learning multiple related tasks together. In this paper, we present significant improvements in personalized news recommendation by integrating auxiliary and cascaded tasks. Our study compares single-task learning (STL) models with multitask architectures. We evaluated performance on the primary task of news-click prediction, along with predicting user interest in news categories and topics as auxiliary tasks, and fully scrolled prediction and reading pattern prediction as cascaded tasks. Our results indicate that MTL models outperform STL baselines in terms of AUC metrics. These results underscore the benefits of using multiple related tasks to capture richer signals of user behavior, while also highlighting challenges that remain, such as effectively integrating non-click samples in cascaded tasks.

Fine-Tuning Large Multimodal Models for Fitness Action Quality Assessment

Gaetano Dibenedetto
Elio Musacchio
Marco Polignano
Pasquale Lops

Action Quality Assessment (AQA) plays an important role in evaluating human performance in different domains, including fitness, sports, and healthcare. This work introduces a novel AQA approach by fine-tuning large multimodal models (LMMs) for personalized activity evaluation. We used the Fitness-AQA Dataset, which provides detailed annotations of exercise errors under realistic conditions, and we adapt the LLaVA-Video model, a state-of-the-art LMM comprising the Qwen2 large language model and the SigLIP vision encoder. We have implemented a customized data preparation pipeline that transforms video-based exercise annotations into a conversational format specific for fine-tuning. To our knowledge, this study is among the first to fine-tune LMMs for AQA tasks and the very first to explore activity evaluation in this context. The experimental evaluation shows that our model achieves results slightly lower than the baseline, even though it is able to generalize across multiple exercises. The full-reproducible code is available on GitHub https://github.com/GaetanoDibenedetto/UMAP25.

From Overall Sentiment to Aspect-Level Insights: A Pretraining Strategy for Unsupervised Aspect-based Sentiment Analysis

Simone Prete
Giovanni Maria Biancofiore
Fedelucio Narducci
Eugenio Di Sciascio
Tommaso Di Noia

Aspect-Based Sentiment Analysis (ABSA) aims to identify sentiments associated with specific aspects within a text. It plays a crucial role in applications such as product reviews and customer feedback analysis, where understanding nuanced opinions is essential. However, progress in ABSA remains constrained by the need for fine-grained labeled data, limiting the applicability of supervised models in real-world scenarios. In this study, we propose an unsupervised transformer-based approach that leverages sentence-level sentiment annotations to induce aspect-level sentiment representations. By supervising attention distributions during pretraining, our model learns to aggregate token-level sentiment cues into context-aware aspect sentiment predictions aligned with sentence-level supervision. We further introduce an attention-based correction mechanism to refine aspect sentiment classification by accounting for the local context of each aspect term. Evaluated on benchmark datasets including Restaurants, Laptops, and Twitter domains, our method outperforms unsupervised baselines on aspect category classification while remaining comparable with strong supervised baselines on aspect term sentiment tasks. These results demonstrate that attention-guided pretraining enables robust, domain-adaptive ABSA without requiring aspect-level supervision.

Human-AI Collaborated Ideation for Learning Reduce & Reuse Waste

Qiming Sun
I-Han Hsiao

This work presents a platform that facilitates sustainability learning through waste management principles - reducing and reusing waste from everyday items. The platform features (a) an AI assistant that helps users express and develop their ideas about sustainable practices and (b) a commentary interface to support asynchronous collaboration. The AI agent provides context-aware suggestions while encouraging users to modify and personalize these recommendations, creating a collaborative approach to sustainability ideation. A user study was conducted, and the effectiveness of the approach was evaluated. Results demonstrated increased sustainability awareness among participants after using the platform, with varying patterns of improvement across different sustainability approaches. In particular, users who actively modified AI suggestions produced higher-quality contributions with more specific actionable recommendations compared to those who directly copied AI’s responses. The study also reveals insights into how AI assistance affects content quality. These findings contribute to understanding how AI can be effectively integrated into sustainability education platforms to enhance learning outcomes.

InteractiveReq: Enhancing Software Requirement Specification with Critiquing-based Recommender Systems

Sebastian Lubos
Alexander Felfernig
Damian Garber
Viet-Man Le
Manuel Henrich
Reinhard Willfort
Ivan Dukic

Specifying user stories and epics for feature requirements in agile software development is essential but time-consuming, demanding significant stakeholder effort. To improve this situation, we introduce InteractiveReq, an interactive critiquing-based recommender system that simplifies the generation of high-quality requirements. By using Large Language Models (LLMs), InteractiveReq enables an iterative process where stakeholders refine feature requirements through interactive feedback, addressing the issue of incomplete initial specifications. The system recommends custom drafts of epics and user stories based on the project context and an initial feature description, which users can refine through natural language critiques until their needs are satisfied. This approach aims to reduce workload and offers an intuitive method for requirement management. Preliminary results indicate that InteractiveReq effectively supports the creation of complete and accurate specifications.

Investigating AI in Programming Education: Self-Reported AI Usage, Individual Traits, and Learning Outcomes

Mubina Kamberovic
Amra Delic
Senka Krivic

Understanding how students perceive and utilize Large Language Models (LLMs) and how these interactions relate to their learning behavior and individual differences is crucial for optimizing educational process and outcomes. This paper introduces a novel dataset comprising weekly self-reported data from students in an introductory programming course, i.e., students’ AI tool usage, perceived difficulty of weekly subject areas, personality traits, preferred learning styles, and general attitudes toward AI. We present a descriptive overview of the collected data and conduct a correlation analysis to gain first insights into the students’ individual differences and their learning outcomes, frequency of AI tools usage, as well as their attitudes toward AI. The findings reveal that while individual student characteristics did not show significant correlations with final performance or frequency of AI tool usage, the combination of students’ expectations for success and their perceived value of the task (constructs of expectancy theory) were significantly associated with both course outcomes and how often they used the AI tool. Additionally, motivational factors may be the key to fostering positive attitudes toward AI, while personality traits, particularly those related to negative emotionality, may play a more significant role in shaping resistance. This initial analysis lays the groundwork for future investigations on the prospects of AI in support of the students’ learning process.

It's Time to Let Go: Stopping Criteria Recommendations in Content-rich Domains

Matej Scerba
Ladislav Peska

When shopping for new products, people typically tend to adopt maximizer or satisficer behavior patterns. While satisficers stop searching as soon as they find a suitable product, maximizers seek the best option among all available choices. Even though most people normally behave as satisficers, they tend to adopt maximizer patterns in high-stakes decisions. In this work, we argue that contemporary e-commerce solutions are well-suited to supporting satisficers, but they often lack features that assist maximizers. Out of this missing functionality, cut-off alerts, i.e., reassurances that the user has already covered all/most of the potentially relevant options, have not yet been explored in related research. To address this gap, we leverage the observation that high-stakes decisions often occur in content-rich domains. Building on this, we propose an enhanced human-computer interaction model incorporating contextual explanations and cut-off alerts. The proposed functionality was evaluated in a user study where the stopping criteria interface variant substantially outperformed the unseen statistics, which resembles the interfaces commonly available on e-commerce.

JARVIS: Adaptive Dual-Hemisphere Architectures For Personalized Large Agentic Models

Francesco Manco
Domenico Roberto
Marco Polignano
Giovanni Semeraro

In this work, we propose JARVIS. It aims to provide LLMs with a stronger degree of personalization via a two-hemisphere architecture inspired by the biological organization of the human brain, following a Large Agentic Model (LAM) architecture. The subjective hemisphere operates by dynamically modeling the user’s preferences and iteratively optimizing its behaviors, through a training phase grounded on LoRA (Low-Rank Adaptation), DPO (Proximal Policy Optimization), human feedback, and synthetic data ("digital dreams"). Conversely, the objective hemisphere serves a rational-like role, reducing hallucination and the chances of getting dangerous misinformation using more structural approaches. In JARVIS, such hemispheres are ground on a dual-level memory capability. Short-term memory keeps track of short-term preferences, ensuring continuity in dialogues and long-term user behaviors and interactions. Long-term memory is gradually developed to collect all the possible user ground preferences, skills, and general behavioral routines. Unlike current state-of-the-art approaches, JARVIS provides a personalized and context-aware alternative, facilitating seamless and fluent interactions with the end-user.

Personalized Preference Profiles for Preference Discovery

Sushmita Khan
Mehtab Iqbal
Laila Shafiee
Aminata Ndiaye Mbodj
Bart Knijnenburg

Effective recommender systems rely on the assumption of well-defined user preferences, an assumption frequently violated in practice. So, to assist users in developing and understanding their preferences, we designed six different visualizations that juxtapose users’ predicted preferences against those of a larger audience. We conducted think-aloud studies to investigate which visualization helps optimally develop and understand preferences. Our findings contribute to a broader call to develop recommender systems that support users’ self-actualization and long-term perspective.

Simulating Human Opinions with Large Language Models: Opportunities and Challenges for Personalized Survey Data Modeling

Carolin Kaiser
Jakob Kaiser
Vladimir Manewitsch
Lea Rau
Rene Schallner

Public and private organizations rely on opinion surveys to inform business and policy decisions. Yet, empirical surveys are costly and time-consuming. Recent advances in large language models (LLMs) have sparked interest in generating synthetic survey data, i.e., simulated answers based on target demographics, as an alternative to real human data. But how well can LLMs replicate human opinions? In this ongoing project, we develop and critically evaluate methods for synthetic survey sampling. As an empirical benchmark, we collected responses from a representative U.S. sample (n = 461) on preferences for a common consumer good (soft drinks). Then, we developed ASPIRE (Automated Synthetic Persona Interview and Response Engine), a tool that pairs each human participant with a “digital twin” based on their demographic profile and generates synthetic responses via LLM technology. Synthetic data achieved better-than-chance accuracy in matching human responses and approximated aggregate subjective rankings for both binary and Likert-scale items. However, LLM-simulated data overestimated humans’ tendencies to provide positive ratings and exhibited substantially reduced variance compared to real data. The match of synthetic and real data was not systematically related to participants’ age, gender, or ethnicity, indicating no demographic bias. Overall, while synthetic sampling shows promise for modeling aggregate opinion trends, it currently falls short in replicating the variability and complexity of real human opinions. We discuss insights of our ongoing project for accurate and responsible user opinion modeling via LLMs.

Social Influence and the Perceived Fairness of Algorithmic Decisions: An Exploratory Study

Styliani Kleanthous
Anthi Ioannou
Filippos Papandreou

AI-driven decision-support systems (DSSs) are increasingly shaping critical choices across industries, yet non-experts often struggle to understand and assess their fairness. This study explores how individuals engage with AI-generated decisions that impact others and how social factors, such as majority or minority opinions, influence fairness judgments. Our initial findings highlight that different groups vary in their interest in understanding AI decisions, with social influence playing a key role in shaping fairness perceptions. These insights provide a valuable foundation for future research on transparency and trust in AI decision-making.

Soundtracks of Our Lives: How Age Influences Musical Preferences

Arsen Matej Golubovikj
Bruce Ferwerda
Alan Said
Marko Tkalcic

The majority of research in recommender systems, be it algorithmic improvements, context-awareness, explainability, or other areas, evaluates these systems on datasets that capture user interaction over a relatively limited time span. However, recommender systems can very well be used continuously for extended time. Similarly so, user behavior may evolve over that extended time. Although media studies and psychology offer a wealth of research on the evolution of user preferences and behavior as individuals age, there has been scant research in this regard within the realm of user modeling and recommender systems. In this study, we investigate the evolution of user preferences and behavior using the LFM-2b dataset, which, to our knowledge, is the only dataset that encompasses a sufficiently extensive time frame to permit real longitudinal studies and includes age information about its users. We identify specific usage and taste preferences directly related to the age of the user, i.e., while younger users tend to listen broadly to contemporary popular music, older users have more elaborate and personalized listening habits. The findings yield important insights that open new directions for research in recommender systems, providing guidance for future efforts.

Stealthy LLM-Driven Data Poisoning Attacks Against Embedding-Based Retrieval-Augmented Recommender Systems

Fatemeh Nazary
Yashar Deldjoo
Tommaso Di Noia
Eugenio Di Sciascio

We present a systematic study of provider-side data poisoning in retrieval-augmented recommender systems (RAG-based). By modifying only a small fraction of tokens within item descriptions—for instance, adding emotional keywords or borrowing phrases from semantically related items—an attacker can significantly promote or demote targeted items. We formalize these attacks under token-edit and semantic-similarity constraints, and we examine their effectiveness in both promotion (long-tail items) and demotion (short-head items) scenarios. Our experiments on MovieLens, using two large language model (LLM) retrieval modules, show that even subtle attacks shift final rankings and item exposures while eluding naive detection. The results underscore the vulnerability of RAG-based pipelines to small-scale metadata rewrites, and emphasize the need for robust textual consistency checks and provenance tracking to thwart stealthy provider-side poisoning.

Tell Me the Good Stuff: User Preferences in Movie Recommendation Explanations

Juan Ahmad
Jonas Hellgren
Alan Said

Recommender systems play a vital role in helping users discover content in streaming services, but their effectiveness depends on users understanding why items are recommended. In this study, explanations were based solely on item features rather than personalized data, simulating recommendation scenarios. We compared user perceptions of one-sided (purely positive) and two-sided (positive and negative) feature-based explanations for popular movie recommendations. Through an online study with 129 participants, we examined how explanation style affected perceived trust, transparency, effectiveness, and satisfaction. One-sided explanations consistently received higher ratings across all dimensions. Our findings suggest that in low-stakes entertainment domains such as popular movie recommendations, simpler positive explanations may be more effective. However, the results should be interpreted with caution due to potential confounding factors such as item familiarity and the placement of negative information in explanations. This work provides practical insights for explanation design in recommender interfaces and highlights the importance of context in shaping user preferences.

Thalamus: A User Simulation Toolkit for Prototyping Multimodal Sensing Studies

Kayhan Latifzadeh
Luis A. Leiva

Conducting user studies that involve physiological and behavioral measurements is very time-consuming and expensive, as it not only involves a careful experiment design, device calibration, etc. but also a careful software testing. We propose Thalamus, a software toolkit for collecting and simulating multimodal signals that can help the experimenters to prepare in advance for unexpected situations before reaching out to the actual study participants and even before having to install or purchase a specific device. Among other features, Thalamus allows the experimenter to modify, synchronize, and broadcast physiological signals (as coming from various data streams) from different devices simultaneously and not necessarily located in the same place. Thalamus is cross-platform, cross-device, and simple to use, making it thus a valuable asset for HCI research.

The Kimono Era Has Long Passed: Generative AI-Assisted Reminiscence Therapy for Individuals with Late-Stage Dementia

Liuru Nan
Panote Siriaraya
Wan Jou She
Noriaki Kuwahara

This paper presents our study on implementing generative AI technology to enhance Reminiscence Therapy (RT). Building on the promising feedback from an earlier prototype tested with individuals aged 70 who were pre-diagnosed with dementia, we refined the system and conducted further evaluations with nonagenarians (aged 90+) diagnosed with late-stage dementia, a group which has been largely underrepresented in existing research. Our system incorporated an image-augmented interaction mechanism, using Azure Computer Vision to analyze photos and Azure Cognitive Services to convert users’ voice input into generative prompts. GPT-4o and Stable Diffusion were used to converse with users and for image generation. Through our system, users were able to engage in conversation and discuss their past experiences as part of a reminiscence exercise, during which corresponding images were generated to increase the vividness of their reminiscence experience. Four pairs of nonagenarians with late-stage dementia and their caregivers participated in our study and used our system. Based on interviews conducted with their caregivers, we found that, due to their severe decline in cognitive capacity, nonagenarians with late-stage dementia were unlikely to tolerate cultural misrepresentation or draw meaningful inferences from the AI-generated images. Moreover, an iterative approach to generating final images might be more effective, as individuals with late-stage dementia could struggle to express themselves fully. Finally, we report the findings and lessons learned from this group of participants and reflect on the practical impact of our study.

Towards Personalized and Contextualized Code Explanations

Michelle Brachman
Arielle Goldberg
Andrew Anderson
Stephanie Houde
Michael Muller
Justin D. Weisz

Code understanding is a common and important use case for generative AI code assistance tools. Yet, a user’s background, context, and goals may impact the kinds of code explanations that best fit their needs. Our aim was to understand the kinds of configurations users might want for their code explanations and how those relate to their context. We ran an exploratory study with a medium-fidelity prototype and 10 programmers. Participants valued having configurations and desired automated personalization of code explanations. They found particular merit in being able to configure the structure and detail level in code explanations and felt that their needs might change depending on their prior experience and goals.

Words reveal wants: How well can simple LLM-based AI agents replicate people’s choices based on their social media posts

Sofie Goethals
Johannes Luther
Sandra Matz

As artificial intelligence systems take on increasingly agentic roles, they begin making decisions on behalf of users rather than merely supporting them. Consequently, it becomes crucial to understand how closely these systems can replicate human choices. In this study, we examine the extent to which digital traces of user behavior can serve as a foundation for modeling individual preferences. Specifically, we use Facebook status updates, a form of self-disclosed digital traces. Based on these digital traces, the goal is to predict users’ Facebook likes across various categories (e.g., Food, Movies, Public Figures, etc.), which serve as behavioral expressions of preference. Tested over 10,000 queries, we find that most categories achieve a prediction accuracy exceeding 60%, indicating generally robust performance of the Large Language Model. These findings suggest that digital traces such as Facebook status updates can reveal meaningful patterns that allow AI systems to learn more about decisions in other contexts.

SESSION: 6th Workshop on Adapted intEraction with SociAl Robots (cAESAR)

6th Workshop on Adapted intEraction with SociAl Robots (cAESAR)

Francesca Cocchella
Alberto Lillo
Giuseppe Palestra
Luca Raggioli
Giulia Scorza Azzarà
Cristina Gena

Human Robot Interaction (HRI) is a field of study dedicated to understanding, designing, and evaluating robotic systems for use by, or with, humans. In HRI there is a consensus about the design and implementation of robotic systems that should be able to adapt their behavior based on user actions and behavior. The robot should adapt to emotions, personalities, and it should also have a memory of past interactions with the user to become believable. This is of particular importance in the field of social robotics and social HRI. The aim of this Workshop is to bring together researchers and practitioners who are working on various aspects of social robotics and adaptive interaction. The expected result of the workshop is a multidisciplinary research agenda that will inform future research directions and hopefully, forge some research collaborations.

Computational Models of Cognitive and Affective Theory of Mind

Luca Raggioli
Alessandra Rossi
Silvia Rossi

Theory of Mind is described as the capability to attribute mental states to oneself and others, and it can be essential for robots to favor more collaborative, adaptive, and emotionally appropriate behaviors when they are deployed in human-centered environments. In this work, we survey existing methodologies to introduce ToM in Human-Robot Interaction, focusing on two main formalizations: Cognitive ToM, focusing on reasoning about beliefs and intentions in a more task oriented way, and Affective ToM, which requires the agent to recognize and adapt to others’ emotional states. While both approaches have advanced robot adaptability and user engagement, they are often developed in isolation. For this reason, we will discuss the need for integrated models that combine cognitive and affective reasoning, and outline future directions for more socially intelligent and emotionally aware robotic systems.

Multimodal LLM Question Generation for Children's Art Engagement via Museum Social Robots

Alessio Ferrato
Cristina Gena
Carla Limongelli
Giuseppe Sansonetti

This paper proposes using social robots to enhance children’s experiences in museums. Specifically, we aim to equip these social robots with multimodal large language models (MLLMs) to generate questions that engage children interactively. To achieve this, we evaluate the capabilities of LLaVA models in generating diverse and relevant questions about artworks, comparing their performance on visual questions with contextual questions. We utilize a subset of the AQUA dataset to assess both quantitative metrics and qualitative aspects of the generated questions. Additionally, we examine the models’ ability to create engaging questions tailored specifically for children. We emphasize how MLLMs can generate questions that may increase enjoyment during visits, promote active observation, and enhance children’s cognitive and emotional engagement with artworks. This approach aims to contribute to more inclusive and effective learning experiences in museum settings.

Quantum-Enhanced Social Robotics: The QUADRI Project

Berardina Nadja De Carolis
Corrado Loglisci
Maria Grazia Miccoli
Giuseppe Palestra
Sergio Violante

This paper presents a review of the integration of quantum computing techniques into social robotics, focusing on the potential for enhancing robot adaptability, decision-making, and emotional intelligence. We analyze the current state of research, examining how quantum algorithms, such as Grover’s algorithm, can be applied to improve human-robot interaction. The review, as a task of the QUADRI project (QUAntum-enhanceD human-Robot Interaction: Pioneering Intelligent Social Robotics), provides a comprehensive overview of the opportunities and challenges in this emerging field, setting the foundation for future research and practical applications in domains such as mental health, education, and workplace stress management.

Robots adapting to dogs: a new frontier?

Angelo Paloka
Alberto Lillo
Fabiana Vernero
Filipa Correia
Valentina Nisi
Laura Lossi
Cristina Gena

While Human-Robot Interaction (HRI) has seen extensive exploration, Animal-Robot Interaction (ARI) remains a less mature field. This paper presents a first AI-based prototype designed to enable a humanoid robot to recognize emotional and postural states in dogs and adapt its behavior accordingly. Using a deep learning-based pipeline for real-time detection and classification, the robot could adapt its movements to better accommodate canine responses. We propose that such an adaptive approach paves the way for more natural coexistence between robots and animals in domestic settings, raising new challenges in perception, behavior design, and ethics within ARI.

Towards a Structured Multimodal Speech-Image Coordination

Massimo Donini
Michael Oliverio
Pier Felice Balestrucci
Luca Anselma
Cristina Gena
Alessandro Mazzei
Matteo Nazzario
Irene Borgini

This paper presents a novel approach to the use of the humanoid robot Pepper in educational contexts, focusing on dialogic and multimodal interaction for teaching abstract concepts in mathematics and physics. Unlike traditional lecture-based models, our system supports learner-centered lessons where students engage in spontaneous dialogue with the robot while interacting with visual content displayed on its tablet. The robot responds to user questions through the coordination of speech, synchronized textual output, and dynamic visual cues, such as real-time modifications to vectorial images that highlight relevant elements. To promote accessibility and engagement, the system includes customizable visual features such as color schemes and adjustable font settings for visual-impaired users. This approach aims to foster a more inclusive, personalized, and interactive learning experience by adapting the lesson content to the learner’s interests and inquiries.

vNAO: Virtual NAO as a Cognitive Companion for the Elderly

Alberto Lillo
Claudio Mattutino
Cristina Gena

This paper presents the development of a simulated assistive system based on the NAO humanoid robot, designed to support cognitive engagement and well-being in elderly users. Leveraging the Webots simulation environment, we integrated advanced functionalities including voice interaction through Google Speech Recognition, contextual dialogue using the LLaMA language model, and speech synthesis via pyttsx3. The system enables the virtual NAO (vNAO) to conduct conversational interactions, administer cognitive exercises, issue reminders, and guide users through physical activities, all within a personalized, elderly-friendly virtual environment. Our implementation demonstrates that a simulation-based approach can provide a scalable, accessible framework for testing and deploying socially assistive robotics.

SESSION: 7th Workshop on Explainable User Models and Personalised Systems (ExUM 2025)

7th Workshop on Explainable User Models and Personalised Systems (ExUM 2025)

Cataldo Musto
Marco Polignano
Amon Rapp
Giovanni Semeraro
Jürgen Ziegler

In recent years, adaptive and personalized systems, underpinned by cutting-edge technologies such as Large Language Models (LLMs), have emerged as pivotal forces in reshaping the digital landscape. These systems, seamlessly woven into the fabric of everyday life, manifest in diverse forms—from conversational agents that emulate human-like dialogue to recommendation algorithms that tailor content like music, films, and consumer products to individual preferences. Their pervasive integration into digital platforms has fundamentally altered the ways in which users engage with information, make decisions, and interact with technology, positioning them as indispensable tools in modern society. The transformative potential of these technologies lies in their ability to enhance user engagement, streamline content delivery, and support decision-making processes with unprecedented precision. However, as their influence continues to expand, so too does the urgency to confront critical challenges surrounding transparency, fairness, and user trust. ExUM workshop seeks to explore these pressing issues, advocating for a balanced approach that prioritizes not only technological efficacy but also ethical integrity and user empowerment.

Developing Human-Centered Intelligent Learning Systems: the application of CARAIX framework

Miguel Portaz
Angeles Manjarrés
Olga C. Santos
Raúl Cabestrero
Pilar Quirós
Mar Hermosilla
David Puertas-Ramirez
Jesus G. Boticario
Gadea Lucas Pérez
Ana Serrano-Mamolar
Álvar Arnaiz-González
Miguel Arevalillo-Herráez
David Arnau
Pablo Arnau-González
Raúl Fernández-Matellán
David Martin Gomez

Developing personalized systems requires architectures that ensure adaptability, explainability, and ethical compliance while maintaining user engagement and trust. To assess whether a system meet these principles, this article puts to the test a novel framework, denominated CARAIX (Collaborative, Adaptive, and Responsible Artificial Intelligence -AI- assisted by eXplainability), designed to develop intelligent systems with a human-centered approach, to support real-time feedback and bias-aware AI decision-making. CARAIX is inspired by the principles of the Hybrid Intelligence (HI) paradigm and emphasizes the integration of explainable AI techniques in the development process to enhance user interaction and system reliability. This paper analyses, using a peer-validated rubric, how the dimensions of the HI paradigm are integrated across four diverse and real-world learning scenarios, including intelligent tutoring systems, psychomotor skill acquisition, autonomous driving training, and acquiring occupational safety competences. CARAIX is designed for scalability and reuse, facilitating integration into various AI-driven educational domains. We aim to share its potential for sustainable and ethically sound AI-enhanced multidisciplinary learning environments and for the assessment of whether a system complies with HI principles.

Explainable Sentiment Analysis through Counterfactual Reasoning

Simone Prete
Giovanni Maria Biancofiore
Fedelucio Narducci
Eugenio Di Sciascio
Tommaso Di Noia

Sentiment Analysis (SA) has proven to be an effective tool for recognizing opinions in text. However, the mechanisms by which these models arrive at specific predictions often remain unclear. This paper explores how eXplainable Artificial Intelligence (XAI) techniques can enhance interpretability in sentiment classification. Specifically, we leverage SHAP (SHapley Additive exPlanations) and counterfactual generation to identify words influencing sentiment predictions in movie reviews. Our approach integrates a neural classifier and generates counterfactual examples to reveal how slight text modifications affect model decisions. Experimental results show that SHAP-based attribution and counterfactual analysis provide deeper insights into the linguistic factors driving sentiment classification.

Leveraging Generative AI to Improve Comprehensibility in Social Recommender Systems

Md Ashaduzzaman
Chun-Hua Tsai

Generative AI, particularly Large Language Models (LLMs), has revolutionized human-computer interaction by enabling the generation of nuanced, human-like text. This presents new opportunities, especially in enhancing explainability for AI systems like recommender systems, a crucial factor for fostering user trust and engagement. LLM-powered AI-Chatbots can be leveraged to provide personalized explanations for recommendations. Although users often find these chatbot explanations helpful, they may not fully comprehend the content. Our research focuses on assessing how well users comprehend these explanations and identifying gaps in understanding. We also explore the key behavioral differences between users who effectively understand AI-generated explanations and those who do not. We designed a three-phase user study with 17 participants to explore these dynamics. The findings indicate that the clarity and usefulness of the explanations are contingent on the user asking relevant follow-up questions and having a motivation to learn. Comprehension also varies significantly based on users’ educational backgrounds.

Modeling Musical Genre Trajectories through Pathlet Learning

Lilian Marey
Charlotte Laclau
Bruno Sguerra
Tiphaine Viard
Manuel Moussallam

The increasing availability of user data on music streaming platforms opens up new possibilities for analyzing music consumption. However, understanding the evolution of user preferences remains a complex challenge, particularly as their musical tastes change over time. This paper uses the dictionary learning paradigm to model user trajectories across different musical genres. We define a new framework that captures recurring patterns in genre trajectories, called pathlets, enabling the creation of comprehensible trajectory embeddings. We show that pathlet learning reveals relevant listening patterns that can be analyzed both qualitatively and quantitatively. This work improves our understanding of users’ interactions with music and opens up avenues of research into user behavior and fostering diversity in recommender systems. A dataset of 2000 user histories tagged by genre over 17 months, supplied by Deezer (a leading music streaming company), is also released with the code.

PHaSE Project - Promoting Healthy and Sustainable Eating through Interactive and Explainable AI Methods

Cataldo Musto
Amon Rapp
Ludovico Boratto

The PHaSE project promotes healthier and more sustainable eating habits through the integration of recommender systems and conversational agents. It aims to enhance users’ awareness and understanding of responsible eating by enabling them to explore the healthiness of recipes, identify more sustainable ingredient substitutes, and receive personalized advice.

"They Only Offer the Illusion of Choice": Exploring User Perceptions of Control and Agency on YouTube

Muheeb Faizan Ghori
Arman Dehpanah
Jonathan Gemmell
Bamshad Mobasher

Recommender systems (RS) have gained widespread adoption in digital platforms across a variety of domains. However, the users’ understanding of how these systems function and how they might adjust it typically remains opaque. When users perceive a lack of ability to control or personalize the system, this can lead to a loss of trust and lower perceived usefulness of the RS. In this study, we explore the user perceptions of control over YouTube and the strategies they employ to exercise agency. Using a thematic analysis of 200 discussion threads from Reddit, this study examines how users exercise agency, drawing on self-reported user experiences with YouTube’s recommender system. Our findings provide insights into the users’ understanding of the various control mechanisms and their ability to align the system with their personal preferences.

Towards Explainable Temporal User Profiling with LLMs

Milad Sabouri
Masoud Mansoury
Kun Lin
Bamshad Mobasher

Accurately modeling user preferences is vital not only for improving recommendation performance but also for enhancing transparency in recommender systems. Conventional user-profiling methods—such as averaging item embeddings—often overlook the evolving, nuanced nature of user interests, particularly the interplay between short-term and long-term preferences. In this work, we leverage large language models (LLMs) to generate natural language summaries of users’ interaction histories, distinguishing recent behaviors from more persistent tendencies. Our framework not only models temporal user preferences but also produces natural language profiles that can be used to explain recommendations in an interpretable manner. These textual profiles are encoded via a pre-trained model, and an attention mechanism dynamically fuses the short-term and long-term embeddings into a comprehensive user representation. Beyond boosting recommendation accuracy over multiple baselines, our approach naturally supports explainability: the interpretable text summaries and attention weights can be exposed to end users, offering insights into why specific items are suggested. Experiments on real-world datasets underscore both the performance gains and the promise of generating clearer, more transparent justifications for content-based recommendations.

Towards Personalised and User-Friendly Counterfactual Sequences for Failure Correction

Jasmina Gajcin
Jovan Jeromela
Ivana Dusparic

Understanding how failures occur and how they can be corrected is essential for debugging, maintaining user trust, and developing personalised policies. Counterfactual sequences, which provide an alternative sequence of actions that delivers an improved outcome, have been used to correct failure in sequential decision-making tasks. However, prior work on counterfactual sequences has focused primarily on the algorithmic side of sequence generation, mostly overlooking the potential of counterfactuals as a user-friendly explanation method. In this work, we lay the groundwork for human-centred and personalised counterfactual sequence generation. Informed by insights from psychology and cognitive science, we propose a set of desiderata for understandable and useful counterfactual sequences. We then introduce an algorithm based on these desiderata that generates diverse counterfactual sequences, enabling the user to correct the failure in line with their preferences.

Towards Understanding Persuasive and Personalized Engagement for Human-AI Reliance

Muhammad Raees
Vassilis-Javed Khan
Konstantinos Papangelis

AI assistance can be dynamically adapted to persuade users to build reliance on AI systems. Personalizing AI assistance based on users’ latent traits and real-time behavior can also improve human-AI collaborative decision-making. However, there is limited exploration in the literature on personalizing AI assistance to user traits and behavior. Understanding how users engage and interact with personalized explanations from the lens of reducing over-reliance is also underexplored. In this position paper, we present a rationale for personalized and persuasive interventions to build appropriate reliance and enhance user engagement with AI assistance. We examine the current literature and argue that user-centric persuasion and engagement improve analytical system evaluation and foster reliance on AI assistance. Considering persuasive and personalized AI assistance, we posit a study design for user-centered engagement to improve appropriate reliance.

What If the Prompt Were Different? Counterfactual Explanations for the Characteristics of Generative Outputs

Sofie Goethals
Joao Sedoc
Foster Provost

As generative AI systems become increasingly integrated into real-world applications, the need to analyze and interpret their outputs grows in importance. This paper addresses the challenge of assessing whether generative outputs exhibit specific characteristics—such as toxicity, a certain sentiment, or bias. We borrow a concept from the traditional Explainable AI literature- counterfactual explanations- but argue that it needs to be significantly rethought. We propose a flexible framework that extends counterfactual explanations to non-deterministic generative AI systems, specifically in scenarios where downstream classifiers can reveal characteristics of their outputs.

SESSION: 7th UMAP Workshop on Fairness in User Modeling, Adaptation, and Personalization (FairUMAP 2025)

7th UMAP Workshop on Fairness in User Modeling, Adaptation, and Personalization (FairUMAP 2025)

Bamshad Mobasher
Styliani Kleanthous
Robin Burke
Avital Shulner-Tal
Tsvi Kuflik

A Fair Share: Fair Allocation of Satellite Observation Windows According to User Preferences in a Distributed Setting

Shai Krigman
Lihi Dery
Tal Grinshpoun

When users with different preferences and entitlements compete for limited access to a shared resource, maximizing total utility can lead to significant disparities in how users are served. We examine this tension in the context of allocating satellite observation windows, where users differ in their willingness to pay or their contribution to the system. The goal is to schedule observations efficiently while promoting balanced, fair access among users. This challenge is amplified in settings where coordination is decentralized and users negotiate outcomes without a central authority. We propose a hybrid algorithm designed to balance fairness and efficiency in distributed scheduling. Our method produces allocations that retain high efficiency while reducing inequality. Although developed for satellite scheduling, the algorithm applies more broadly to decentralized systems where users with heterogeneous preferences share limited resources.

Balancing Health Information-Seeking through Retrieval-Augmented Generation-Based LLM Chatbot

Gargi Nandy
Srishti Gupta
Farhad Mohammad Afzali
Eric Peeples
Betsy Pilon
Chun-Hua Tsai

Family caregivers play a vital role in supporting children with chronic health conditions, such as neonates diagnosed with hypoxic-ischemic encephalopathy (HIE). However, navigating complex medical information can be overwhelming due to the quantity and quality of available literature. This study leverages Retrieval-Augmented Generation (RAG)-based Large Language Models (LLMs) to develop a chatbot that integrates peer-reviewed scientific literature and provides personalized, simplified summaries for caregivers. A user study involving six caregivers and five healthcare providers demonstrated the chatbot’s ability to enhance clarity, improve comprehension, and deliver essential medical information concisely. Our findings highlight the potential of RAG-based LLMs to enhance caregivers’ health literacy and support their information-seeking behavior, while also underscoring the importance of thoughtfully navigating the differing expectations of caregivers and healthcare providers regarding the type, depth, and presentation of medical information.

Reducing the Emotional Distress of Content Moderators through LLM-based Target Substitution in Implicit and Explicit Hate-Speech

Nazanin Jafari
James Allan

Hate speech is often subtle and context-dependent, making it especially difficult to detect particularly when it requires contextual familiarity related to the targeted group. Exposure to hate speech and toxic content can lead to significant psychological harm, including increased stress and anxiety levels and content moderators are particularly vulnerable due to exposure to such harmful material. This work explores the role of personalization in content moderation by examining how alignment between a moderator’s background and the targeted group affects emotional and cognitive responses. We propose a target substitution method that replaces references to real communities in hate speech with fictional characters, aiming to reduce emotional distress while preserving the semantic integrity necessary for accurate moderation. Through both automated and human evaluations, we find that substitution significantly reduces emotional distress across all groups with a trade-off in accuracy. Moreover, we observe that moderators demonstrate higher accuracy when moderating content aligned with their own demographic background, even after substitution. This suggests the key role of contextual familiarity in interpreting implicit hate. Additionally, our study highlights the cumulative impact of prolonged exposure to hate speech, showing that moderators experience increased emotional distress over time, particularly in non-targeted scenarios. Despite this, target substitution consistently mitigates distress while maintaining moderation efficacy.

Searching for BLM: Google Search Results During the 2020 Black Lives Matter Protests

Chau Tong

The nationwide surge of Black Lives Matter (BLM) protests that followed George Floyd's murder in May 2020 offers a rare, large-scale natural experiment for interrogating fairness in information access. We combine three complementary data sources: (1) an original U.S. survey linking political attitudes to self-reported queries; (2) state-level Google Trends signals; and (3) a collection of 1,500 Google Search ranked URLs elicited with attitude-conditioned queries, to trace how user intent and algorithmic ranking jointly shape what people see when they “search for BLM”. The analyses reveal three fairness-relevant patterns. First, survey respondents who opposed BLM reported different queries (e.g., “protester violence”) than supporters (“equality”), indicating that query formulation is shaped by political stance. Second, aggregate Trends data showed that BLM-supportive states generated more search traffic for BLM-affirming queries than states with lower support, indicating politically slanted collective search interest. Third, result-page audits found a slight left-of-center domain bias- even for anti-BLM queries- while topic modeling showed subtly distinct content framings depending on the queries’ stance. Taken together, the study provides empirical evidence that can inform fairness interventions and design implications for adaptive systems to anticipate and counteract ideologically skewed information pathways.

Simulating the Algorithm Store: Multistakeholder Impacts of Recommender Choice

Anas Buhayh
Elizabeth McKinnie
Clement Canel
Robin Burke

Recommender systems play an essential role in connecting users with items. Traditionally, research in this field has focused on refining recommendation algorithms within monolithic systems that reside in a single platform. We are exploring alternative architectures in which users have a choice over recommendation algorithms. In this work, we use simulation grounded in real-world data to explore the impact of such alternative designs on recommendation stakeholders. We show that consumers of niche items and producers of such items can both benefit from algorithmic choice.

User and Recommender Behavior Over Time: Contextualizing Activity Effectiveness Diversity and Fairness in Book Recommendation

Samira Vaez Barenji
Sushobhan Parajuli
Michael D. Ekstrand

Data is an essential resource for studying recommender systems. While there has been significant work on improving and evaluating state-of-the-art models and measuring various properties of recommender system outputs, less attention has been given to the data itself, particularly how data has changed over time. Such documentation and analysis provide guidance and context for designing and evaluating recommender systems, particularly for evaluation designs making use of time (e.g., temporal splitting). In this paper, we present a temporal explanatory analysis of the UCSD Book Graph dataset scraped from Goodreads, a social reading and recommendation platform active since 2006. We measure the book interaction data using a set of activity, diversity, and fairness metrics; we then train a set of collaborative filtering algorithms on rolling training windows to observe how the same measures evolve over time in the recommendations. Additionally, we explore whether the introduction of algorithmic recommendations in 2011 was followed by observable changes in user or recommender system behavior.

When to Ask a Question: Understanding Communication Strategies in Generative AI Tools

Charlotte Park
Kate Donahue
Manish Raghavan

Generative AI tools (GAITs) fundamentally differ from traditional machine learning tools in that they allow users to provide as much or as little information as they choose in their inputs. This flexibility often leads users to omit certain details, relying on the GAIT to infer and fill in less critical information based on distributional knowledge of user preferences. Inferences about preferences lead to natural questions about fairness, since a GAIT’s “best guess” may skew towards the preferences of larger groups at the expense of smaller ones. Unlike more traditional recommender systems, GAITs can acquire additional information about a user’s preferences through feedback or by explicitly soliciting it. This creates an interesting communication challenge: the user is aware of their specific preference, while the GAIT has knowledge of the overall distribution of preferences, and both parties can only exchange a limited amount of information. In this work, we present a mathematical model to describe human-AI co-creation of content under information asymmetry. Our results suggest that GAITs can use distributional information about overall preferences to determine the “right” questions to ask to maximize both welfare and fairness, opening up a rich design space in human-AI collaboration.

XAI4RE – Using Explainable AI for Responsible and Ethical AI

Avital Shulner-Tal
Julia Sheidin

As Artificial Intelligence (AI) systems are increasingly integrated into high-stakes domains, the demand for transparency has become paramount. The opacity of "black-box" models poses significant challenges in trust, fairness, and accountability. Explainable AI (XAI) is a vital approach for addressing these concerns by enabling transparency, fostering trust, and ensuring ethical deployment across various sectors, including healthcare, human resources, finance, autonomous systems, and more. This paper explores how XAI methods can be used throughout the AI lifecycle for creating human-centered, ethical, and responsible AI systems by enhancing transparency, reducing bias, and protecting data privacy. Furthermore, the paper introduces XAI4RE, a theoretical framework that links XAI principles and purposes to concrete stages of the AI lifecycle, demonstrating how to address ethical considerations effectively. This approach involves engaging different stakeholders, such as developers, regulators, and users, at each stage. The framework highlights the critical role of XAI in promoting fairness, accountability, and human-centric design using general guidelines that discuss the relevant insights that can be drawn from XAI at each lifecycle stage. Ultimately, this paper underscores the importance of XAI in bridging the gap between technical advancements and ethical AI practices to foster societal trust and responsible systems.

SESSION: 4th Workshop on Group Modeling, Adaptation and Personalization (GMAP 2025)

GMAP 2025: 4th Workshop on Group Modeling, Adaptation and Personalization

Francesco Barile
Amra Delić
Ladislav Peska
Isabella Saccardi
Cedric Waterschoot

Group Recommender Systems (GRSys) are designed to recommend items that address the needs of groups of people. Compared to individual users, groups are dynamic entities where interpersonal relationships, group dynamics, emotional contagion, etc., substantially affect the group’s needs. Nevertheless, these characteristics are often poorly defined or overlooked in system modeling. The fourth GMAP workshop brought together a community of scholars focused on group modeling, adaptation, and personalization. The event was dedicated to exploring the challenges and opportunities of supporting collective decision-making, fostering interdisciplinary dialogue, and forging new collaborations. The four presented papers covered a diverse range of topics: (i) an exploratory analysis of LLM applications to group meeting transcripts, (ii) an extensive review of the growing methodological divide in group recommender systems, (iii) a novel application of group modeling for personalizing public displays, and (iv) a detailed examination of prompt design for group recommendations using LLMs.

Bridging the Rift: A Critical Perspective on the Divergence in Group Recommender Systems Research

Ladislav Peska
Amra Delic
Francesco Barile
Patrik Dokoupil

Group recommender systems (GRSys) focus on the challenges of recommending to groups of users with possibly contradicting needs and preferences. Methodologically, we distinguish between approaches aiming to aggregate preferences of group members and aggregating per-user recommendations. In early GRSys research, this methodological duality did not affect the connected research objectives and evaluation methodology much. However, nowadays, we witness a gradual rift in the research induced by both algorithm classes. In this work, based on a survey of 110 recent GRSys papers, we aim to quantify this rift along several aspects, including involved communities, evaluation datasets, objectives, and baselines. We showcase how little both subtrees have in common nowadays and discuss missed opportunities this rift causes. In conclusion, we also highlight novel research avenues that may contribute towards bridging the rift to the benefit of both research areas.

Group Modeling Cultural Dimension Values for Intercultural Personalization

Laura Stojko

Large interactive displays in semi-public areas are shared by diverse users whose cultural backgrounds influence how they perceive user interfaces. Personalizing such interfaces with cultural differences in mind requires aggregating individual cultural user models (based on Hofstede’s cultural dimensions) into a group profile. This paper investigates the applicability of group modeling strategies for this purpose as a preliminary exploration, addressing the unique characteristics of cultural dimension values, which differ from traditional numeric ratings. Using an example dataset representing an intercultural group, strategies were identified to rank cultural dimensions and aggregate their values. Borda count produced the clearest ranking, while average without misery and fairness emerged as promising value aggregation strategies. These findings demonstrate how group modeling can support intercultural personalization of shared interfaces and extend the use of these strategies to other types of preference values.

The Pitfalls of Growing Group Complexity: LLMs and Social Choice-Based Aggregation for Group Recommendations

Cedric Waterschoot
Nava Tintarev
Francesco Barile

Large Language Models (LLMs) are increasingly applied in recommender systems aimed at both individuals and groups. Previously, Group Recommender Systems (GRS) often used social choice-based aggregation strategies to derive a single recommendation based on the preferences of multiple people. In this paper, we investigate under which conditions language models can perform these strategies correctly based on zero-shot learning and analyse whether the formatting of the group scenario in the prompt affects accuracy. We specifically focused on the impact of group complexity (number of users and items), different LLMs, different prompting conditions, including In-Context learning or generating explanations, and the formatting of group preferences. Our results show that performance starts to deteriorate when considering more than 100 ratings. However, not all language models were equally sensitive to growing group complexity. Additionally, we showed that In-Context Learning (ICL) can significantly increase the performance at higher degrees of group complexity, while adding other prompt modifications, specifying domain cues or prompting for explanations, did not impact accuracy. We conclude that future research should include group complexity as a factor in GRS evaluation due to its effect on LLM performance. Furthermore, we showed that formatting the group scenarios differently, such as rating lists per user or per item, affected accuracy. All in all, our study implies that smaller LLMs are capable of generating group recommendations under the right conditions, making the case for using smaller models that require less computing power and costs.

Towards Group Decision Support with LLM-based Meeting Analysis

Sebastian Lubos
Alexander Felfernig
Damian Garber
Viet-Man Le
Manuel Henrich
Reinhard Willfort
Jeremias Fuchs

In many situations, groups of people need to collaborate to achieve a shared goal or solve a common problem. However, reaching an agreement can be inefficient, and critical perspectives can be overlooked during lengthy discussions. To improve this situation, this paper introduces a practical approach that uses an LLM to analyze recorded group discussions and provide informed recommendations. It analyzes meeting transcripts to identify discussed options, summarize outcomes, track decision dynamics, and generate helpful recommendations. This automation could save time, enhance transparency, and improve productivity. Through real-world case studies, we evaluate the approach to explore the strengths and limitations of using LLMs to support group decision-making.

SESSION: Hybrid AI for Human-Centric Personalization (HyPer)

Hybrid AI for Human-Centric Personalization (HyPer)

Elisabeth Lex
Kevin Innerebner
Marko Tkalcic
Dominik Kowald
Markus Schedl

Hybrid AI, which integrates symbolic and sub-symbolic methods, has emerged as a promising paradigm for advancing human-centric personalization. By combining machine learning with structured knowledge representations, hybrid AI enables interpretable and adaptive user models that account for human factors such as biases, mental models, and affective states. The HyPer workshop provides a venue to discuss how hybrid AI approaches, combining neural architectures, symbolic representations, and cognitive/behavioral frameworks, can bridge the gap between explainability, cognitive modeling, and automated adaptation to user preferences.

Building Human-AI Reliance Through Cognitive Engagement and Exploratory AI Assistance

Muhammad Raees
Vassilis-Javed Khan
Konstantinos Papangelis

AI assistance is increasingly used to improve human-AI collaborative decision-making. However, how domain experts integrate their knowledge with grounded constraints and formulate intent with AI systems remains underexplored. In this position paper, we argue for “cognitively aligned” AI assistance, where users engage interactively with symbolic (logic-based) and sub-symbolic AI to interpret, influence, and co-construct decisions. Through this lens, we believe that users can build effective reliance on AI assistance, iteratively anchoring their domain knowledge to adapt their mental models and AI assistance. We explore the current literature and emphasize the need for cognitive (analytical) engagement with AI assistance to improve semantic alignment and interactive affordances for domain experts. We outline a plan for a research study that explores users’ interaction with AI assistance and quantitative reasoning in business decision-making.

Differentiable Fuzzy Neural Networks for Recommender Systems

Stephan Bartl
Kevin Innerebner
Elisabeth Lex

As recommender systems become increasingly complex, transparency is essential to increase user trust, accountability, and regulatory compliance. Neuro-symbolic approaches that integrate symbolic reasoning with sub-symbolic learning offer a promising approach toward transparent and user-centric systems. In this work-in-progress, we investigate using fuzzy neural networks (FNNs) as a neuro-symbolic approach for recommendations that learn logic-based rules over predefined, human-readable atoms. Each rule corresponds to a fuzzy logic expression, making the recommender’s decision process inherently transparent. In contrast to black-box machine learning methods, our approach reveals the reasoning behind a recommendation while maintaining competitive performance. We evaluate our method on a synthetic and MovieLens 1M datasets and compare it to state-of-the-art recommendation algorithms. Our results demonstrate that our approach accurately captures user behavior while providing a transparent decision-making process. Finally, the differentiable nature of this approach facilitates an integration with other neural models, enabling the development of hybrid, transparent recommender systems. ¹

Hybrid Personalization Using Declarative and Procedural Memory Modules of the Cognitive Architecture ACT-R

Kevin Innerebner
Dominik Kowald
Markus Schedl
Elisabeth Lex

Recommender systems often rely on sub-symbolic machine learning approaches that operate as opaque black boxes. These approaches typically fail to account for the cognitive processes that shape user preferences and decision-making. In this vision paper, we propose a hybrid user modeling framework based on the cognitive architecture ACT-R that integrates symbolic and sub-symbolic representations of human memory. Our goal is to combine ACT-R’s declarative memory, which is responsible for storing symbolic chunks along sub-symbolic activations, with its procedural memory, which contains symbolic production rules. This integration will help simulate how users retrieve past experiences and apply decision-making strategies. With this approach, we aim to provide more transparent recommendations, enable rule-based explanations, and facilitate the modeling of cognitive biases. We argue that our approach has the potential to inform the design of a new generation of human-centered, psychology-informed recommender systems.

MoRTELaban: a Neurosymbolic Framework for Motion Representation and Analysis based on Labanotation and Laban Movement Analysis

Roberto Perez-Martinez
Alberto Casas-Ortiz
Olga C. Santos

Human motion cannot be fully modeled by subsymbolic representations. While these extract precise hidden patterns in motion data, they are often task-specific and lack a semantic understatement of motion. Symbolic systems that mirror human cognition and explicit expressive processes are necessary for richer motion synthesis and analysis, enabling physical reasoning and expert knowledge encoding. In this work, we propose a neurosymbolic framework that combines Labanotation and Laban Movement Analysis (LMA), originally developed for dance, to represent and analyze human motion symbolically. We expand the existing LabanEditor to support full-body annotation and integrate it with AMASS, Mediapipe, and Kinect inputs through a SMPL-based format. Our system supports automatic annotation for the local functional and expressive aspects of motion, and enables bidirectional conversion between symbols and motion. While still a work in progress, this framework lays the groundwork for explainable, expressive motion modeling that can support human-robot interaction, motion preservation, and psychomotor learning systems.

Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents

Schaun Wheeler
Olivier Jeunen

Large Language Models (LLMs) represent a landmark achievement in Artificial Intelligence (AI), demonstrating unprecedented proficiency in procedural tasks such as text generation, code completion, and conversational coherence. These capabilities stem from their architecture, which mirrors human procedural memory—the brain’s ability to automate repetitive, pattern-driven tasks through practice. However, as LLMs are increasingly deployed in real-world applications, it becomes impossible to ignore their limitations operating in complex, unpredictable environments. This paper argues that LLMs, while transformative, are fundamentally constrained by their reliance on procedural memory. To create agents capable of navigating “wicked” learning environments—where rules shift, feedback is ambiguous, and novelty is the norm—we must augment LLMs with semantic memory and associative learning systems. By adopting a modular architecture that decouples these cognitive functions, we can bridge the gap between narrow procedural expertise and the adaptive intelligence required for real-world problem-solving.

The Impact of Confidence Ratings on User Trust in Large Language Models

Lifei Wang
Natalie Friedman
Chengchao Zhu
Zeshu Zhu
S Joy Mountford

This study investigated how displaying AI confidence levels affected user trust and effectiveness in decision-making contexts. Current chatbot interfaces lack transparency in response reliability, which could lead to misguided trust in AI-generated content. We addressed this limitation through a confidence rating interface that visually communicates model certainty and provides prompt improvement suggestions. We conducted a between-subjects study (n=20) comparing a standard chatbot interface with a confidence rating interface that displays three features: 1) confidence rating, 2) confidence factors, and 3) prompt improvement suggestions. Participants completed tasks which could be done in an enterprise setting. These tasks included asking about travel planning suggestions, fact verification about unfamiliar topics, multi-step problem solving for a timezone, and decision making about a stock value. While we didn’t reach statistical significance with this small sample size, results showed that the confidence rating interface tended to improve user effectiveness and confidence, particularly in tasks requiring verification or reasoning. Our findings suggest that combining confidence indicators with prompt suggestions could enhance information evaluation when working with AI systems, with implications for enterprise applications where trust is essential.

The Potential of AutoML for Recommender Systems

Tobias Vente
Lukas Wegmeth
Joeran Beel

Automated Machine Learning (AutoML) has significantly advanced Machine Learning (ML) applications, including model compression, machine translation, and computer vision. Recommender Systems (RecSys) can be seen as an application of ML. Yet AutoML has received little attention from the RecSys community, and RecSys has not received notable attention from the AutoML community. Only a few relatively simple Automated Recommender Systems (AutoRecSys) libraries exist that adopt AutoML techniques. However, these libraries are based on student projects and do not offer the features and thorough development of AutoML libraries. We set out to determine how AutoML libraries perform in the scenario of an inexperienced user who wants to implement a recommender system. We compared the predictive performance of 60 AutoML, AutoRecSys, ML, and RecSys algorithms from 15 libraries, including a mean predictor baseline, on 14 explicit feedback RecSys datasets. We found that AutoML and AutoRecSys libraries performed best. AutoML libraries performed best for six of the 14 datasets (43%), but the same AutoML library did not always perform best. The single-best library was the AutoRecSys library Auto-Surprise, which performed best on five datasets (36%). On three datasets (21%), AutoML libraries performed poorly, and RecSys libraries with default parameters performed best. Although while obtaining 50% of all placements in the top five per dataset, RecSys algorithms fall behind AutoML on average. ML algorithms generally performed the worst.

User Orientations and Stage-Specific Behaviors in E-commerce Exploratory Search: A Formative Study

Eunhye Kim
Kiroong Choe
Guangjing Yan
Mingyu Kang

Exploratory search begins without a specific goal, often leading to information overload and search fatigue as users attempt to understand, interpret, and retrieve information. While recent advances have enabled more sophisticated search agents, there remains a gap in understanding user behavior and cognitive processes during exploratory search in complex E-commerce environments, where users navigate through multiple information types while making rapid decisions. In this formative study (n=8), we collected browsing data and user feedback about search stages and information needs in a fashion e-commerce platform. Through qualitative analysis, we identified four distinct user orientations (brand, price, style, and popularity-driven) and mapped behaviors to specific exploratory search stages. These behavioral insights could inform future approaches to personalization that better align with users’ cognitive processes during different search stages, potentially contributing to the development of more human-centric systems for complex online shopping environments.

SESSION: The 1st Workshop on Sustainable and Trustworthy Large Language Models for Personalization (LLM4Good)

LLM4Good: The 1st Workshop on Sustainable and Trustworthy Large Language Models for Personalization

Thomas Elmar Kolb
Ashmi Banerjee
Ahmadou Wagne
Julia Neidhardt
Yashar Deldjoo

Large Language Models (LLMs) are transforming personalized services by enabling adaptive, context-aware recommendations and interactions. However, deploying these models at scale raises significant concerns about environmental impact, fairness, privacy, and trustworthiness, including high energy consumption, biased outputs, privacy breaches, and hallucinations. The LLM4Good workshop is a half-day workshop that addresses these challenges by fostering dialogue on sustainable and ethical approaches to LLM-based personalization. Participants will explore energy-efficient techniques, bias mitigation, privacy-preserving methods, and responsible deployment strategies. The workshop aligns with Sustainable Development Goals and Digital Humanism principles. It aims to guide the development of trustworthy, human-centric LLM systems that positively impact education, healthcare, and other domains.

Enhancing Mathematical Reasoning in GPT-J Through Topic-Aware Prompt Engineering

Lev Sukherman
Yetunde Folajimi

This study evaluates the effectiveness of three prompting strategies: standard prompting, chain-of-thought (CoT) prompting, and informed CoT prompting on the performance of the GPT-J model in solving mathematical reasoning tasks from the GSM8K dataset. Using the full test set of 1,319 problems, we assess the model’s performance through accuracy, F1 score, BLEU, and ROUGE metrics. The findings suggest that while providing relevant context, such as math topics, can modestly enhance performance, the gains are limited. This underscores the importance of carefully designing prompts in adaptive systems and indicates that additional strategies may be necessary to achieve practical utility in educational applications.

Exploring Responsible Use of Generative AI in Disaster Resilience for Indigenous Communities

Taranum Bano
Chun-Hua Tsai
Srishti Gupta
Yu-Che Chen
Edouardo Zendejas
Sarah Krafka

Tribal communities face unique challenges in disaster response, often lacking resources and infrastructure to effectively respond to emergencies. This study explores the potential of generative Artificial Intelligence (AI) to enhance disaster response within these communities. We designed a multi-modality generative AI system for disaster assessment from user-generated photos and organized reports with community in-kind cost sharing. We introduced the system prototype at the 2024 National Congress of American Indians (NCAI) conference with emergency department professionals from diverse tribal nations and other stakeholders. Through a workshop-focused group discussion, we discussed the perception, ideas, and concerns for introducing generative AI technology to tribal communities to increase disaster resilience. Our findings suggest considerations about developing strategies and possible governance models when introducing LLM-based models to marginalized local communities with limited resources. This research contributes to literature of the potential and limitations of AI in supporting disaster preparedness and response within indigenous communities, ultimately informing strategies for enhanced tribal disaster resilience and sustainable development goals.

From Feedback to Formative Guidance: Leveraging LLMs for Personalized Support in Programming Projects

Fatemeh Ghoochani
Jonas Scharfenberger
Burkhardt Funk
Raoul Doublan
Mayur Jakharabhai Odedra
Bennet Etsiwah

Large Language Models (LLMs) offer scalable opportunities to personalize feedback in education, yet their trustworthiness and effectiveness remain underexplored. We present a study conducted in an introductory programming and data science course with approximately 1,400 first-year university students. A subset of these students received both peer and LLM-generated feedback on their individual programming projects. Our results show that 56% of students preferred the LLM feedback, and 52% could not reliably distinguish it from human-written feedback. Student ratings suggest that LLM feedback is perceived as helpful, constructive, and relevant, though it often lacks personalized depth and motivational nuance. These findings underline the potential of LLMs to support scalable, personalized education, while pointing to key areas for responsible improvement. Based on these insights, we outline the future roadmap for the course in which LLM-generated feedback supports students in their learning journey but also instructors through monitoring student performance and helping to allocate instructional resources more effectively. Given limited human resources this approach enables personalized instructor feedback to be scaled to a large group of students.

Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements

Isamu Isozaki
Manil Shrestha
Rick Console
Edward Kim

Hacking poses a significant threat to cybersecurity, inflicting billions of dollars in damages annually. To mitigate these risks, ethical hacking, or penetration testing, is employed to identify vulnerabilities in systems and networks. Recent advancements in large language models (LLMs) have shown potential across various domains, including cybersecurity. However, there is currently no comprehensive, open, end-to-end penetration testing benchmark to drive progress and evaluate the capabilities of these models in security contexts. This paper introduces a novel open benchmark¹ for LLM-based penetration testing, addressing this critical gap. We first evaluate the performance of LLMs, including GPT-4o and LLama 3.1-405B, using the state-of-the-art PentestGPT tool. Our findings reveal that while LLama 3.1 demonstrates an edge over GPT-4o, both models currently fall short of performing end-to-end penetration testing even with some minimal human assistance. Next, we advance the state-of-the-art and present ablation studies that provide insights into improving the PentestGPT tool². Our research illuminates the challenges LLMs face in each aspect of Pentesting, e.g. enumeration, exploitation, and privilege escalation. This work contributes to the growing body of knowledge on AI-assisted cybersecurity and lays the foundation for future research in automated penetration testing using large language models.

Towards the Embodied Conversational Interview Agentic Service ELIAS: Development and Evaluation of a First Prototype

Tobias Budig
Marcia Nißen
Tobias Kowatsch

Interviews in social and health sciences are resource intensive and susceptible to interviewer bias, inconsistency, and variability across interviewers. Moreover, human-led interviews may inhibit participant openness, especially regarding sensitive topics, due to judgment, compromised anonymity, or discomfort in face-to-face interactions. These shortcomings limit the quality of the data collected. To this end, we propose the Embodied Conversational Interview Agentic Service (ELIAS). Informed by human-developed interview guides, ELIAS aims to streamline the interview process by combining an empathetic and bias-free embodied conversational interview agent with a semi-supervised content analysis and coding agent. We describe the development of the first version of ELIAS and also present results from a first evaluation study with five participants. We assessed the acceptance of and alliance with the embodied conversational interview agent. The evaluation shows positive perceptions and a strong alliance with the conversational agent. Suggestions for improvement will guide our future work.

“Which vocational training program is best for me?” – Design of a recommender system for school students using large language models

Alexander Piazza
Sigurd Schacht
Michael Herzog

School students need to make decisions about their career paths after graduating. In Germany, students can choose between more than 300 vocational training programs, which can be overwhelming. Frequently, the students hesitate to talk with career counselors. The objective of this research is, therefore, to provide a recommendation system for school students to support their decision-making, which is based on their interests and provides recommendations with explanations based on a LLM. This system was developed with a social robot as the user interface to make it easy to use and appeal to the young target group. Based on user observations, preliminary findings indicate that the system is a valuable and engaging approach to support career counseling activities.

SESSION: The 16th International Workshop on Personalized Access to Cultural Heritage (PATCH 2025)

The 16th International Workshop on Personalized Access to Cultural Heritage (PATCH 2025)

Tsvi Kuflik
Alan J Wecker
Noemi Mauro
Liliana Ardissono

Following the success of previous editions, PATCH 2025 again serves as a meeting point at the intersection of cutting-edge cultural heritage research and personalized technologies. The workshop focuses on the use of ICT, both onsite and online, to enhance personal experiences in settings of natural and cultural heritage, with particular attention to ubiquitous and adaptive scenarios. PATCH 2025 brings researchers and practitioners from different disciplines together to explore how personalization and technology can enrich cultural heritage experiences. The workshop fosters the exchange of innovative ideas, encourages multidisciplinary dialogue, and aims to shape future research directions through collaboration. This summary provides an overview of the papers accepted for presentation and inclusion in the workshop proceedings, highlighting the latest advances and emerging trends in this dynamic field.

Exploring the Potential of Multimodal Large Language Models for Question Answering on Artworks

Alessio Ferrato
Carla Limongelli
Fabio Gasparetti
Giuseppe Sansonetti
Alessandro Micarelli

This paper investigates the application of a Multimodal Large Language Model to enhance visitor experiences in cultural heritage settings through Visual Question Answering (VQA) and Contextual Question Answering (CQA). We evaluate the zero-shot capabilities of LLaVA-7b (Large Language and Vision Assistant) on QA using the AQUA dataset. We assess how effectively it can answer questions about artwork, visual content, and contextual information through three experimental approaches. Our findings reveal that LLaVA demonstrates promising performance on visual questions, outperforming previous baselines but facing challenges with questions requiring contextual understanding. The selective knowledge integration approach showed the best overall performance, suggesting an efficient knowledge retrieval systems could enhance performance. Moreover, we show how to exploit such models to provide correct personalized answers using a well-established visitor model.

Knowledge Graph-based User Models and Personalized Access for Cultural Heritage

Stefano Ferilli

Cultural Heritage is opening up from the professional community to a wider public, generating an increasing demand for culture and an associated economic turnaround. This step requires to differentiate the behavior of Cultural Heritage systems, dealing with a wide variety of backgrounds, expectations, contexts, aims, educational and cultural level, preferences and interests. Computer Science and Artificial Intelligence can play a key role in this landscape, fine-tuning the fruition of cultural items to every kind of stakeholder and even to single users. In this paper we present an approach to personalization of Cultural Heritage fruition based on Knowledge Graphs. An approach to describe user models, and to use them for extracting personalized information, is proposed, and a platform that embeds this approach is described.

My Heritage Companion: An AI-Driven Mobile Experience for Visual Storytelling

Dr. Eiman Tamah Al-Shammari

My Heritage Companion is a mobile-first framework that reimagines cultural heritage engagement through ethically adaptive, simulation-based personalization. The system enables users to upload personal visual artifacts—such as sketches, heirlooms, or travel photographs—which serve as entry points for AI-informed, persona-driven storytelling. Rather than relying on behavioral tracking or social media integration, it employs a cold-start personalization approach using rule-based persona modeling to deliver cognitively accessible, culturally contextualized narratives. The framework integrates four core modules: image ingestion, simulated AI-based visual matching, persona-driven narrative adaptation, and privacy safeguards guided by the FATE principles (Fairness, Accountability, Transparency, and Ethics). The current prototype simulates AI behavior using real heritage imagery from Failaka Island—an archaeological site of multi-era significance spanning the Dilmun, Hellenistic, and early Islamic periods. This simulation pipeline validates user experience logic, interface adaptability, and narrative delivery across five user personas. My Heritage Companion advances digital museology by supporting inclusive access to cultural heritage in privacy-sensitive, low-infrastructure contexts. It demonstrates how mobile-first systems can ethically bridge personal memory and public history through adaptive storytelling—empowering users to become co-creators of heritage experiences.

Recommending Paintings in Web Art Gallery with Adjustable Popularity and Diversity

Rully Agus Hendrawan
Peter Brusilovsky
Bereket A. Yilma
Luis A. Leiva

The cold start problem remains a major challenge in visual art recommendation, where limited user feedback often forces systems to rely on content-based filtering. While effective with sufficient data, content similarity-based recommendation can reinforce filter bubbles, narrowing user exposure to mainstream content. Popularity and diversity are both critical factors in recommendation systems, as they impact the visibility of niche items and overall user satisfaction. Yet, existing platforms often rely on popularity-centric algorithms that may discourage exploration and overshadow lesser-known items. To address this gap, our work investigates whether users’ preferences for popular and diverse recommendations remain stable over short sessions of recommendation. We propose an interactive, user-adjustable mechanism allowing individuals to control the balance between mainstream and novel suggestions in real-time. We implement this approach within a Web gallery recommendations. Through user study, we examine changes in user behavior. Our findings suggest that while many users initially gravitate toward popular and diverse content, providing controls encourages later adjustments and exploratory behavior. This highlights the need for cultural institutions to move from a tightly managed centralized model to offering users greater affordances for managing the popularity and diversity of personalized recommendations.

Towards Cultural Preservation of Traditional Motion Knowledge through Automated Annotations with MoRTELaban

Roberto Perez-Martinez
Alberto Casas-Ortiz
Olga C. Santos

Movement disciplines like dance or martial arts are carriers of cultural knowledge, identity, and tradition. However, oral traditions and video recordings make the preservation of this knowledge susceptible to being lost. Expert movement notation, in turn, holds the potential for precise capture and knowledge inheritance. However, motion notation approaches are not widespread, the process is often time-consuming, and the movements are hard to visualize without expert knowledge. In this work, we use Labanotation and Laban Movement Analysis (LMA), a notation system and method originally developed for dance, as a symbolic, interpretable framework for motion representation and preservation. Our contribution resides in the expansion of an existing annotation system, the LabanEditor, to handle full-body motion and data from multiple sources, and support the work of experts in annotating the movements. Our development, called MoRTELaban, supports motion-to-notation and inverse mapping from notation to keyframes, enabling exchange between video, motion capture, and Labanotation formats. This allows for the documentation and reconstruction of traditional motion practices using expert-readable scores and 3D skeletons.

User Models for Connected Personalized Avatars for Cultural Heritage Experiences

Alan Jay Wecker
Antonio Origlia
Tsvi Kuflik

User models can be enhanced with context aware models (configurations) of the preferred avatar configuration. These models could be initiated by a set of rules connected to User Personality to mitigate the cold start problem. This joint model can be used and is appropriate for cultural heritage applications. This is a short paper exploring this idea and discussing possible methods of evaluation.

SESSION: 2nd Workshop on Wearable Devices and Brain-Computer Interfaces for User Modelling (WeBIUM 2025)

WeBIUM 2025: 2nd Workshop on Wearable Devices and Brain-Computer Interfaces for User Modelling

Domenico Lofù
Paolo Sorino
Tommaso Colafiglio
Angela Lombardi
Tommaso Di Noia
Fedelucio Narducci

Wearable Devices (WDs), such as smartwatches and fitness trackers, continuously produce extensive data streams that reveal valuable information about physiological states, activity patterns, and user interactions. These devices enable the construction of advanced user models, offering dynamic insights into personal routines, health trends, and behavioural tendencies. Meanwhile, Brain-Computer Interfaces (BCIs) emerge as a transformative technology by capturing neural activity to provide unprecedented access to cognitive and emotional states. However, BCIs are not conventionally classified as wearables; the latest technological advancements have reduced their size to resemble everyday accessories like earphones, suggesting their potential integration into wearable formats in the near future.

Despite the promise of these technologies, the full exploitation of their data for user modelling and personalization—such as optimizing activities like media consumption or interaction design—remains underexplored. The convergence of WDs and BCIs opens up new avenues for understanding the complexity of human behaviour and preferences, and this potential is amplified by the integration of Large Language Models (LLMs). By synthesizing and interpreting multimodal datasets, LLMs can better understand the intricate interplay between physiological, cognitive, and behavioural signals, ultimately enriching user modelling processes.

Following the success of the first edition, this workshop seeks to delve into the deep impact of combining data from wearable devices, neural interfaces, and advanced machine learning models. Participants will explore the opportunities and challenges that arise in this innovative context, examining how these technologies can be harnessed to enhance the granularity and accuracy of user models. The discussions will also address practical implications, such as ethical considerations and the necessity of privacy-aware approaches when dealing with highly sensitive physiological and neural data.

Through collaborative exchanges, the initiative aspires to chart new directions in the field, fostering novel research trajectories and interdisciplinary partnerships. The interplay of WDs, BCIs, and LLMs can redefine user modelling by creating systems that dynamically adapt to individual needs and behaviours, paving the way for transformative advancements in personalized experiences. By drawing on cutting-edge research and practical expertise, the workshop aims to inspire innovative solutions that capitalize on these emerging synergies, advancing the boundaries of what is possible in user modelling and adaptive systems.

Advanced prompt engineering techniques for generative sound synthesis models

Mariagrazia De Leo
Giuseppe Salatino
Fabrizio Festa

Sound synthesis plays a central role in the compositional process in the academic field of electroacoustic domain. This research aims to investigate and integrate the use of generative artificial intelligence models as tools for sound synthesis. Three models, SynthIo, MusicLM and MusicGen, were used in this study. Ten prompts were designed and tested with the aim of generating sound textures. To assess the consistency of the generated samples, three expert electroacoustic music composers evaluated the different samples related to specific requirements.

Emotion Recognition Using Text Embedding Models: Wearable and Wireless EEG Without Fixed EEG Channel Configurations

Quoc-Toan Nguyen
Zheng Huiru
Tahia Tazin
Linh Le
Tuan L. Vo
Nhu-Tri Tran
David Williams-King
Benjamin Tag

Emotion recognition methods using Artificial Intelligence (AI) and wearable/wireless Electroencephalography (wEEG) are promising, as wEEG signals effectively and conveniently capture brain activities related to emotions. However, conventional AI models require separate development for each wEEG channel configuration, limiting adaptability and increasing costs. To address this gap, this paper proposes a framework for leveraging text embedding models to transform wEEG signals into a standardised representation for different wEEG channel setups to be compatible with a single AI model. This approach enhances scalability, adaptability, and resource efficiency, making AI-driven emotion recognition more cost-effective and accessible. Our proposed method achieves an accuracy of 0.9368 and 0.9484 with snowflake-arctic-embed-l-v2.0 with 2-second epoching and multilingual-e5-large-instruct using 5-second epoching. This proposed method can be effectively applied across various wEEG channel configurations to support tasks to improve or explore human well-being, such as stress monitoring or emotion self-regulation.

Neural Musical Instruments through Brain-Computer Interface and Biofeedback

Tommaso Colafiglio
Domenico Lofù
Paolo Sorino
Angela Lombardi
Fedelucio Narducci
Tommaso Di Noia

In the electronic musical instrument scenario, the current paradigm of sound modification during live performance is predominantly based on the use of external control mechanisms to adjust sound configurations predefined by the performer. However, this approach is limited by the introduction of marginal latencies during the transition between sound configurations. To overcome these limitations, this study introduces a novel application of Brain-Computer Interface (BCI) technology in a control system environment for musical instruments during live performances. The proposed system exploits classification between mental states of activation and relaxation, employing a Machine Learning (ML) system that achieves an average Accuracy of 0.92. Using Beta Protocol, the system allows dynamic modulation of sound according to the mental state of the performer. Finally, an explainability analysis was performed to clarify the impact of specific features during the prediction process.

On the Causality between Cognitive Stress and Physiological Stress: The Stroop Test as a Case Study

Abdelmounaam Rezgui

Wearable devices are revolutionizing smart computing in healthcare. In particular, electroencephalography (EEG) devices (e. g., electrode cap bundles, headbands) are currently enabling many healthcare applications that require real-time monitoring of brain electrical activity. Examples of those applications include: epilepsy diagnosis, sleep disorder diagnosis, tumor detection, autonomous navigation (e. g., to control wheelchairs), and stress reduction. In many of these applications, the use of clinical-grade EEG devices may not be feasible because of factors such as high cost, privacy concerns, and inconvenience. In this paper, we used the Granger causality test to study whether consumer-grade EEG devices can detect levels of cognitive stress that can reliably be shown to cause changes in vital signs such as blood volume pulse (BVP), electrodermal activity (EDA), and body temperature. Based on the obtained results, we were able to validate the viability of using consumer-grade wearable devices to build applications for stress monitoring and reduction without the need for advanced, expensive EEG devices.

Sonify Collective Human Intelligence: A Biometric Data Approach to Real-time Sound Design in HCI.

Dario Mattia
Fabrizio Festa

Sonify Collective Human Intelligence (SCHI) is a musical installation that explores collective intelligence's cooperative functions by transforming human interaction into a multisensory experience. Biometric sensors capture real-time physiological variations within a group, translating psycho-emotional shifts into sound and visual design. This interaction creates a data flow that translates emotional state changes into real-time sound textures generated and processed by live electronics. An improvising soloist interacts with this evolving sound design, forming a self-regenerating sound cycle responsive to collective emotions. This research contributes an unexplored framework for integrating biometric data into artistic expression, demonstrating biofeedback's potential in collaborative, emotion-driven interaction, bridging psychology, music technology, and HCI.

User Modeling Meets Research Integrity: Challenges in Translating AI-powered Rehabilitation Systems into Regulated Clinical Practice: User Modeling Meets Research Integrity

Ilaria Bortone
Feliciana Catino
Giuseppe Colacicco
Rodolfo Sardone
Giuseppe Campanile

Artificial intelligence (AI) is increasingly embedded in rehabilitation technologies designed for children with developmental disorders, offering new opportunities for personalised, adaptive care. However, the translation of these systems from lab to clinic is often decisively shaped by regulatory frameworks such as the EU General Data Protection Regulation (GDPR), the Medical Device Regulation (MDR), and the forthcoming AI Act. This position paper explores how these three regulatory pillars influence the ethical deployment of AI and the design and innovation process behind pediatric rehabilitation tools. Drawing from recent literature and ongoing policy developments, we argue that GDPR, MDR, and the AI Act should not be viewed merely as compliance hurdles but as co-design forces that enable trustworthy, interpretable, and clinically viable AI. We propose a forward-looking framework to align innovation with regulation to facilitate the safe and effective implementation of AI-powered rehabilitation in child-centred healthcare.