ACM OpenToc Proceedings

UMAP ’22: Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization

SESSION: Persuasion

A Qualitative Analysis of the Persuasive Properties of Argumentation Schemes

Ramon Ruiz-Dolz, Joaquin Taverner, Stella Heras, Ana Garcia-Fornes, Vicente Botti

Argumentation schemes are generalised patterns that provide a way to (partially) dissociate the content from the reasoning structure of the argument. On the other hand, Cialdini’s principles of persuasion provide a generic model to analyse the persuasive properties of human interaction (e.g., natural language). Establishing the relationship between principles of persuasion and argumentation schemes can contribute to the improvement of the argument-based human-computer interaction paradigm. In this work, we perform a qualitative analysis of the persuasive properties of argumentation schemes. For that purpose, we present a new study conducted on a population of over one hundred participants, where twelve different argumentation schemes are instanced into four different topics of discussion considering both stances (i.e., in favour and against). Participants are asked to relate these argumentation schemes with the perceived Cialdini’s principles of persuasion. From the results of our study, it is possible to conclude that some of the most commonly used patterns of reasoning in human communication have an underlying persuasive focus, regardless of how they are instanced in natural language argumentation (i.e., their stance, the domain, or their content).

How to Ask for Donations? Learning User-Specific Persuasive Dialogue Policies through Online Interactions

Nhat Tran, Malihe Alikhani, Diane Litman

Persuasive conversations are more effective when they are custom-tailored for the intended audience. Current persuasive dialogue systems rely heavily on advice-giving or focus on different framing policies in a constrained and less dynamic/flexible manner. In this paper, we argue for a new approach, in which the system can identify optimal persuasive strategies in context and persuade users through online interactions. We study two main questions (1) can a reinforcement-learning-based dialogue framework learn to exercise user-specific communicative strategies for persuading users? (2) How can we leverage the crowd-sourcing platforms to collect data for training, and evaluating such frameworks for human-AI(/machine) conversations? We describe a prototype system that interacts with users with the goal of persuading them to donate to a charity and use experiments with crowd workers and analyses of our learned policies to document that our approach leads to learning context-sensitive persuasive strategies that focus on user’s reactions towards donation and contribute to increasing dialogue success.

Including Social Expectations for Trustworthy Proactive Human-Robot Dialogue

Matthias Kraus, Nicolas Wagner, Nico Untereiner, Wolfgang Minker

Trust forms an important factor in human-robot interaction and is highly influencing the success or failure of a mixed team of humans and machines. Similarly, to human-human teamwork, communication and proactivity are one of the keys to task success and efficiency. However, the level of proactive robot behaviour needs to be adapted to a dynamically changing social environment. Otherwise, it may be perceived as counterproductive and the robot’s assistance may not be accepted. For this reason, this work investigates the design of a socially-adaptive proactive dialogue strategy and its effects on humans’ trust and acceptance towards the robot. The strategy is implemented in a human-like household assistance robot that helps in the execution of domestic tasks, such as tidying up or fetch-and-carry tasks. For evaluation of the strategy, users interact with the robot while watching interactive videos of the robots in six different task scenarios. Here, the adaptive proactive behaviour of the robot is compared to four different levels of static proactivity: None, Notification, Suggestion, and Intervention. The results show that proactive robot behaviour that adapts to the social expectations of a user has a significant effect on the perceived trust in the system. Here, it is shown that a robot expressing socially-adaptive proactivity is perceived as more competent and reliable than a non-adaptive robot. Based on these results, important implications for the design of future robotic assistants at home are described.

Multidimensional Analysis of Sense of Agency During Goal Pursuit

Roberto Legaspi, Wenzhen Xu, Tatsuya Konishi, Shinya Wada, Yuichi Ishikawa

Sense of agency (SoA) is the subjective experience that one’s own volitional action caused an event to happen. This experience has cast light to understanding fundamental aspects of human behavior, which includes regulating actions during goal pursuit. Due to its many facets, investigating SoA has proved to be a strong challenge, compelling theorists and experimentalists to develop various paradigms to analyze it. While investigations on SoA have primarily focused on simple tasks that probe basic self-agency capacity awareness, and were carried out mostly under controlled laboratory settings over short experiment durations, we investigated this feeling of control in a complex, natural setting where participants performed daily their goal-directed tasks. More importantly, however, we investigated the SoA construct in a multidimensional way, i.e., simultaneously investigating its pre-reflective and reflective, local and general, and dynamic nature, as well as how individual differences moderated its influence on goal pursuit. We collected over 5,000 data points from 43 participants on their daily perceptions of self-agency and pursuance of healthy eating for more than a month outside the confines of a lab using a smartphone app that we designed. We present our analyses and insights that emerged from our empirical results on how the many facets of SoA impacted in various ways the pursuance of the goal. To our knowledge, we are the first to study the SoA construct in this manner, and we posit our method can be used for an intelligent system to enhance a human counterpart’s SoA for self-driven persuasion to follow through the goal.

Nudging Towards Health? Examining the Merits of Nutrition Labels and Personalization in a Recipe Recommender System

Ayoub El Majjodi, Alain D. Starke, Christoph Trattner

Food recommender systems show personalized recipes to users based on content liked previously. Despite their potential, often recommended (popular) recipes in previous studies have turned out to be unhealthy, negatively contributing to prevalent obesity problems worldwide. Changing how foods are presented through digital nudges might help, but these are usually examined in non-personalized contexts, such as a brick-and-mortar supermarket. This study seeks to support healthy food choices in a personalized interface by adding front-of-package nutrition labels to recipes in a food recommender system. After performing an offline evaluation, we conducted an online study (N = 600) with six different recommender interfaces, based on a 2 (non-personalized vs. personalized recipe advice) x 3 (No Label, Multiple Traffic Light, Nutri-Score) between-subjects design. We found that recipe choices made in the non-personalized scenario were healthier, while the use of nutrition labels (our digital nudge) reduced choice difficulty when the content was personalized.

Walking with PACE – Personalized and Automated Coaching Engine

Madhurima Vardhan, Narayan Hegde, Srujana Merugu, Shantanu Prabhat, Deepak Nathani, Martin Seneviratne, Nur Muhammad, Pranay Reddy, Sriram Lakshminarasimhan, Rahul Singh, Karina Lorenzana, Eshan Motwani, Partha Talukdar, Aravindan Raghuveer

We design and implement a personalized and automated physical activity coaching engine, PACE, which uses the Fogg’s behavioral model (FBM) to engage users in mini-conversation based coaching sessions. It is a chat-based nudge assistant that can boost (encourage) and sense (ask) the motivation, ability and propensity of users to walk and help them in achieving their step count targets, similar to a human coach. We demonstrate the feasibility, effectiveness and acceptability of PACE by directly comparing to human coaches in a Wizard-of-Oz deployment study with 33 participants over 21 days. We tracked coach-participant conversations, step counts and qualitative survey feedback. Our findings indicate that the PACE framework strongly emulated human coaching with no significant differences in the overall number of active days, step count and engagement patterns. The qualitative user feedback suggests that PACE cultivated a coach-like experience, offering barrier resolution via motivational and educational support. We use traditional human-computer interaction approaches, to interrogate the conversational data and report positive PACE-participant interaction patterns with respect to addressal, disclosure, collaborative target settings, and reflexivity. As a post-hoc analysis, we annotated the conversation logs from the human coaching arm and trained machine learning (ML) models on these data sets to predict the next boost (AUC 0.73 ± 0.02) and sense (AUC 0.83 ± 0.01) action. In future, such ML-based models could be made increasingly personalized and adaptive based on user behaviors.

SESSION: Explanations and Recommendations

Generating Recommendations with Post-Hoc Explanations for Citizen Science

Daniel Ben Zaken, Avi Segal, Darlene Cavalier, Guy Shani, Kobi Gal

Citizen science projects promise to increase scientific productivity while also connecting science with the general public. They create scientific value for researchers and provide pedagogical and social benefits to volunteers. Given the astounding number of available citizen science projects, volunteers find it difficult to find the projects that best fit their interests. This difficulty can be alleviated by providing personalized project recommendations to users. This paper studies whether combining project recommendations with explanations improves users’ contribution levels and satisfaction. We generate post-hoc explanations to users by learning from their past interactions as well as project content (e.g., location, topics). We provide an algorithm for clustering recommended projects to groups based on their predicted relevance to the user. We demonstrated the efficacy of our approach in offline studies as well as in an online study in SciStarter that included hundreds of users. The vast majority of users highly preferred receiving explanations about why projects were recommended to them, and receiving such explanations did not impede on the contribution levels of users, when compared to other users who received project recommendations without explanations. Our approach is now fully integrated in SciStarter.

Entity-Enhanced Graph Convolutional Network for Accurate and Explainable Recommendation

Qinqin Wang, Elias Tragos, Neil Hurley, Barry Smyth, Aonghus Lawlor, Ruihai Dong

A recommendation engine that relies solely on interactions between users and items will be limited in its ability to provide accurate, diverse and explanation-rich recommendations. Side information should be taken into account to improve performance. Methods like Factorisation Machines (FM) cast recommendation as a supervised learning problem, where each interaction is viewed as an independent instance with side information encapsulated. Previous studies in top-K recommendation have incorporated knowledge graphs (KG) into the recommender system to provide rich information about the relationships between users, items and entities. Nevertheless, these studies do not explicitly capture the preference of users for the side information. Furthermore, some studies explain the recommendation, but there is no unified method of measuring explanation quality.

In this work, we investigate the utility of Graph Convolutional Networks (GCN) and multi-task learning techniques to capture the tripartite relations between users, items and entities. Based on our study, we propose that in the hybrid structure of the KG, its rich relationships are an essential factor for successful recommendation from both an explanation and performance perspective. We propose a novel method named Light Knowledge Graph Convolutional Network (LKGCN) which explicitly models the high-order connectivities between user items and entities. Specifically, we use multi-task learning techniques and attention mechanisms in order to combine user preferences on items and entities. Additionally, we present a unified evaluation method PeX for explainable recommendation models. Extensive experiments on real-world datasets show that the LKGCN is conceptually superior to existing graph-based recommendation methods from two perspectives: recommendation accuracy and interpretation. We release the codes and datasets on github1.

Evaluation of an Engagement-Aware Recommender System for People with Dementia

Lars Steinert, Fynn Linus Kölling, Felix Putze, Dennis Küster, Tanja Schultz

People with Dementia (PwD) and their caregivers can greatly benefit from regular cognitive and social activations. However, these activations need to be engaging and likeable to take effect and to maintain long-term motivation and wellbeing. Taking this into account, finding appropriate items in large activation content catalogues can be a challenging task, which can even lead to unhappiness (”Paradox of Choice”). User-centered Recommender Systems (RS) can help to overcome this obstacle and support PwD and their caregivers in finding engaging and likeable activation contents. In this study, we investigate a dataset collected from PwD and their (in)formal caregivers who jointly used a tablet-based activation system over multiple sessions in an unconstrained care setting. The system applies a content-based recommendation approach based on explicit ratings provided by the PwD and collects audiovisual data during usage. First, we evaluate the real-world user interactions with the RS to gain knowledge about suitable evaluation parameters for our offline analyses. Second, we train a recognition model for engagement based on the audiovisual data and enrich our dataset with the automatically detected information about the PwD’s level of engagement. Last, we apply an offline analysis and compare the RS performance based on different inputs. We show that considering PwD’s level of engagement can help to further improve the rating-based RS in terms of users’ needs and, thus, support them in the activations.

Investigating the Value of Subtitles for Improved Movie Recommendations

Sagi Eden, Amit Livne, Oren Sar Shalom, Bracha Shapira, Dietmar Jannach

Collaborative filtering (CF) is a highly effective recommendation approach based on preference patterns observed in user-item interaction data. Since pure collaborative methods can have certain limitations, e.g., when the data is sparse, hybrid approaches are a common solution, as they are able to combine collaborative information with side-information (SI) about the items. In this work, we explore the value of subtitle information for the problem of movie recommendation. Differently from previously explored types of movie SI, e.g., titles or synopsis, subtitles are not only longer, but also contain unique information that may help us to predict more accurately if a user will enjoy a movie. To assess the usefulness of subtitles, we propose a technical framework named SubtitleCF that combines user and item embeddings derived from interaction data and SI. The subtitles may be embedded in different ways, e.g., Latent Dirichlet Allocation (LDA) and neural techniques. Computational experiments with a framework instantiation that relies on Bayesian Personalized Ranking (BPR) as industry-strength method for item ranking and different text embedding methods demonstrate the value of subtitles in terms of prediction accuracy and coverage. Moreover, a user study (N=247) reveals that the information contained in subtitles can be leveraged to improve the decision-making processes of users.

The challenges of providing explanations of AI systems when they do not behave like users expect

Maria Riveiro, Serge Thill

Explanations in artificial intelligence (AI) ensure that users of complex AI systems understand why the system behaves as it does. Expectations that users may have about the system behaviour play a role since they co-determine appropriate content of the explanations. In this paper, we investigate user-desired content of explanations when the system behaves in unexpected ways. Specifically, we presented participants with various scenarios involving an automated text classifier and then asked them to indicate their preferred explanation in each scenario. One group of participants chose the type of explanation from a multiple-choice questionnaire, the other had to answer using free text.

Participants show a pretty clear agreement regarding the preferred type of explanation when the output matches expectations: most do not require an explanation at all, while those that do would like one that explains what features of the input led to the output (a factual explanation). When the output does not match expectations, users also prefer different explanations. Interestingly, there is less of an agreement in the multiple-choice questionnaire. However, the free text responses indicate slightly favour an explanation that describes how the AI system’s internal workings led to the observed output (i.e., a mechanistic explanation).

Overall, we demonstrate that user expectations are a significant variable in determining the most suitable content of explanations (including whether an explanation is needed at all). We also find different results, especially when the output does not match expectations, depending on whether participants answered via multiple-choice or free text. This shows a sensitivity to precise experimental setups that may explain some of the variety in the literature.

Top-N Recommendation Algorithms: A Quest for the State-of-the-Art

Vito Walter Anelli, Alejandro Bellogín, Tommaso Di Noia, Dietmar Jannach, Claudio Pomo

Research on recommender systems algorithms, like other areas of applied machine learning, is largely dominated by efforts to improve the state-of-the-art, typically in terms of accuracy measures. Several recent research works however indicate that the reported improvements over the years sometimes “don’t add up”, and that methods that were published several years ago often outperform the latest models when evaluated independently. Different factors contribute to this phenomenon, including that some researchers probably often only fine-tune their own models but not the baselines.

In this paper, we report the outcomes of an in-depth, systematic, and reproducible comparison of ten collaborative filtering algorithms—covering both traditional and neural models—on several common performance measures on three datasets which are frequently used for evaluation in the recent literature. Our results show that there is no consistent winner across datasets and metrics for the examined top-n recommendation task. Moreover, we find that for none of the accuracy measurements any of the considered neural models led to the best performance. Regarding the performance ranking of algorithms across the measurements, we found that linear models, nearest-neighbor methods, and traditional matrix factorization consistently perform well for the evaluated modest-sized, but commonly-used datasets. Our work shall therefore serve as a guideline for researchers regarding existing baselines to consider in future performance comparisons. Moreover, by providing a set of fine-tuned baseline models for different datasets, we hope that our work helps to establish a common understanding of the state-of-the-art for top-n recommendation tasks.

Travelers vs. Locals: The Effect of Cluster Analysis in Point-of-Interest Recommendation

Pablo Sanchez, Linus W. Dietz

The involvement of geographic information differentiates point-of-interest recommendation from traditional product recommendation. This geographic influence is usually manifested in the effect of users tending toward visiting nearby locations, but further mobility patterns can be used to model different groups of users. In this study, we characterize the check-in behavior of local and traveling users in a global Foursquare check-in data set. Based on the features that capture the mobility and preferences of the users, we obtain representative groups of travelers and locals through an independent cluster analysis. Interestingly, for locals, the mobility features analyzed in this work seem to aggravate the cluster quality, whereas these signals are fundamental in defining the traveler clusters. To measure the effect of such a cluster analysis when categorizing users, we compare the performance of a set of recommendation algorithms, first on all users together, and then on each user group separately in terms of ranking accuracy, novelty, and diversity. Our results on the Foursquare data set of 139,270 users in five cities show that locals, despite being the most numerous groups of users, tend to obtain lower values than the travelers in terms of ranking accuracy while these locals also seem to receive more novel and diverse POI recommendations. For travelers, we observe the advantages of popularity-based recommendation algorithms in terms of ranking accuracy, by recommending venues related to transportation and large commercial establishments. However, there are huge differences in the respective travelers groups, especially between predominantly domestic and international travelers. Due to the large influence of mobility on the recommendations, this article underlines the importance of analyzing user groups differently when making and evaluating personalized point-of-interest recommendations.

SESSION: Classifying User Behavior

HateVersarial: Adversarial Attack Against Hate Speech Detection Algorithms on Twitter

Edita Grolman, Hodaya Binyamini, Asaf Shabtai, Yuval Elovici, Ikuya Morikawam, Toshiya Shimizu

Machine learning (ML) models are commonly used to detect hate speech, which is considered one of the main challenges of online social networks. However, ML models have been shown to be vulnerable to well-crafted input samples referred to as adversarial examples. In this paper, we present an adversarial attack against hate speech detection models and explore the attack’s ability to: (1) prevent the detection of a hateful user, which should result in termination of the user’s account, and (2) classify normal users as hateful, which may lead to the termination of a legitimate user’s account. The attack is targeted at ML models that are trained on tabular, heterogeneous datasets (such as the datasets used for hate speech detection) and attempts to determine the minimal number of the most influential mutable features that should be altered in order to create a successful adversarial example. To demonstrate and evaluate the attack, we used the open and publicly available “Hateful Users on Twitter” dataset. We show that under a black-box assumption (i.e., the attacker does not have any knowledge on the attacked model), the attack has a 75% success rate, whereas under a white-box assumption (i.e., the attacker has full knowledge on the attacked model), the attack has an 88% success rate.

Low-Level Activity Patterns as Indicators of User Familiarity with Websites

He Yu, Simon Harper, Markel Vigo

Familiarity is a quality of user experience that has traditionally been difficult to define, capture, and quantify. Existing works on measuring familiarity with interactive systems have relied on surveys and self-reporting, which is obtrusive and prone to biases. Here, we propose a data-driven methodology to associate low-level activity patterns with familiarity. As a proof-of-concept, this methodology was tested on a website with 35,819 users over the course of 18 months, including 268 revisiting users who had reported their levels of familiarity with the platform. By using activity patterns as features of predictive models, we were able to successfully classify users with higher levels of familiarity with an accuracy of 82.7%. These results suggest that there is a relationship between user familiarity and activity patterns involving the exploration and use of navigational artefacts including breadcrumbs, navigation bars, and sidebar areas. This research opens up further opportunities for unobtrusively analysing the user experience on the Web.

On the benefits of using Hidden Markov Models to predict emotions

Yuyan Wu, Miguel Arevalillo Herráez, Stamos Katsigiannis, Naeem Ramzan

The availability of low-cost wireless physiological sensors has allowed the use of emotion recognition technologies in various applications. In this work, we describe a technique to predict emotional states in Russell’s two-dimensional emotion space (valence and arousal), using electroencephalography (EEG), electrocardiography (ECG), and electromyography (EMG) signals. For each of the two dimensions, the proposed method uses a classification scheme based on two Hidden Markov Models (HMMs), with the first one trained using positive samples, and the second one using negative samples. The class of new unseen samples is then decided based on which model returns the highest score. The proposed approach was validated on a recently published dataset that contained physiological signals recordings (EEG, ECG, EMG) acquired during a human-horse interaction experiment. The experimental results demonstrate that this approach achieves a better performance than the published baseline methods, achieving an F1-score of 0.940 for valence and 0.783 for arousal, an improvement of more than + 0.12 in both cases.

User’s Knowledge and Information Needs in Information Retrieval Evaluation

Dima El Zein, Célia da Costa Pereira

The existing evaluation measures for information retrieval algorithms still lack awareness about the user’s cognitive aspects and their dynamics. They often consider an isolated query-document environment and ignore the user’s previous knowledge and his/her motivation behind the query. The retrieval algorithms and evaluation measures that account for those factors limit the result’s relevance to one search session, one query, or one search goal. We present a novel evaluation measure that overcomes this limitation. The framework measures the relevance of a result/document by examining its content and assessing the possible learning outcomes, for a specific user. Hence not all documents are relevant to all users. The proposed evaluation measure rewards the results’ content for their novelty with respect to what the user already knows and what has been previously proposed. The results are also rewarded for their contribution to achieving the search goals/needs. We demonstrate the efficiency of the measure by comparing it to the knowledge gain reported by 361 crowd-sourced users searching the Web across 10 different topics.

SESSION: Intelligent User Interfaces 1

How Suitable Is Your Naturalistic Dataset for Theory-based User Modeling?

Aini Putkonen, Aurélien Nioche, Ville Tanskanen, Arto Klami, Antti Oulasvirta

Theory-based, or “white-box,” models come with a major benefit that makes them appealing for deployment in user modeling: their parameters are interpretable. However, most theory-based models have been developed in controlled settings, in which researchers determine the experimental design. In contrast, real-world application of these models demands setups that are beyond developer control. In non-experimental, naturalistic settings, the tasks with which users are presented may be very limited, and it is not clear that model parameters can be reliably inferred. This paper describes a technique for assessing whether a naturalistic dataset is suitable for use with a theory-based model. The proposed parameter recovery technique can warn against possible over-confidence in inferred model parameters. This technique also can be used to study conditions under which parameter inference is feasible. The method is demonstrated for two models of decision-making under risk with naturalistic data from a turn-based game.

Modeling User Strategies on Interactive Information Dashboards

Mohammed Alhamadi, Sarah Clinch, Markel Vigo

Interacting with and making sense of information dashboards is often problematic. Typically, users develop strategies to go around and overcome these problems. These strategies can be conceived as behavioural markers of cognitive processes that indicate problematic interactions. Consequently, if we were able to computationally model these strategies, we could detect if users are encountering problems in real time (and act accordingly). We conducted an experiment (N=63) to identify the interaction strategies users employ on problematic dashboards. We found that while existing challenges impact significantly on user performance, interventions to mitigate such challenges were especially beneficial for those with lower graph literacy. We identified the strategies employed by users when encountering problems: extensive page exploration as a reaction to information overload and use of customisation functionalities when understanding data is problematic. We also found that some strategies are indicators of performance in terms of task completion time and effectiveness: extensive exploration strategies were indicators of lower performance, while the exhibition of customisation strategies is associated with higher effectiveness.

Navigational, Informational or Punk-Rock? An Exploration of Search Intent in the Musical Domain

Bruno Sguerra, Marion Baranes, Romain Hennequin, Manuel Moussallam

The search engine of music streaming platforms is a high-control method for navigating the catalog. If one is to study users’ search behavior in this context, one can leverage the vast body of research on general information behavior while challenging previously well validated models with the domain-specific differences. Due to the nature of musical content, users present a series of different needs and behaviors than on traditional web search. For instance, some users employ the search engine as a means to drive their listening session, inputting many queries in close succession not related to the same information goal.

In this paper, we investigate users’ search goals and how they modulate information behavior in the context of streaming platforms. To this end, we explore real search sessions of users looking for musical content in the context of a major streaming service. We introduce a data-driven method for identifying classes of information needs by aggregating both low-level activity patterns and relative query specificity. We show that, when combined, these features provide an approach not only for isolating classes of user search intent, but for understanding human-music relationship as a whole.

Taming User-Interface Heterogeneity with Uniform Overlays for Blind Users

Utku Uckun, Rohan Tumkur Suresh, Md Javedul Ferdous, Xiaojun Bi, I.V. Ramakrishnan, Vikas Ashok

For many blind users, interaction with computer applications using screen reader assistive technology is a frustrating and time-consuming affair, mostly due to the complexity and heterogeneity of applications’ user interfaces. An interview study revealed that many applications do not adequately convey their interface structure and controls to blind screen reader users, thereby placing additional burden on these users to acquire this knowledge on their own. This is often an arduous and tedious learning process given the one-dimensional navigation paradigm of screen readers. Moreover, blind users have to repeat this learning process multiple times, i.e., once for each application, since applications differ in their interface designs and implementations. In this paper, we propose a novel push-based approach to make non-visual computer interaction easy, efficient, and uniform across different applications. The key idea is to make screen reader interaction ‘structure-agnostic’, by automatically identifying and extracting all application controls and then instantly ‘pushing’ these controls on demand to the blind user via a custom overlay dashboard interface. Such a custom overlay facilitates uniform and efficient screen reader navigation across all applications. A user study showed significant improvement in user satisfaction and interaction efficiency with our approach compared to a state-of-the-art screen reader.

Understanding the Role of Explanation Modality in AI-assisted Decision-making

Vincent Robbemond, Oana Inel, Ujwal Gadiraju

Advances in artificial intelligence and machine learning have led to a steep rise in the adoption of AI to augment or support human decision-making across domains. There has been an increasing body of work addressing the benefits of model interpretability and explanations to help end-users or other stakeholders decipher the inner workings of the so-called ”black box AI systems”. Yet, little is currently understood about the role of modalities through which explanations can be communicated (e.g., text, visualizations, or audio) to inform, augment, and shape human decision-making. In our work, we address this research gap through the lens of a credibility assessment system. Considering the deluge of information available through various channels, people constantly make decisions while considering the perceived credibility of the information they consume. However, with an increasing information overload, assessing the credibility of the information we encounter is a non-trivial task. To help users in this task, automated credibility assessment systems have been devised as decision support systems in various contexts (e.g., assessing the credibility of news or social media posts). However, for these systems to be effective in supporting users, they need to be trusted and understood. Explanations have been shown to play an essential role in informing users’ reliance on decision support systems. In this paper, we investigate the influence of explanation modalities on an AI-assisted credibility assessment task. We use a between-subjects experiment (N = 375), spanning six different explanation modalities, to evaluate the role of explanation modality on the accuracy of AI-assisted decision outcomes, the perceived system trust among users, and system usability. Our results indicate that explanations play a significant role in shaping users’ reliance on the decision support system and, thereby, the accuracy of decisions made. We found that users performed with higher accuracy while assessing the credibility of statements in the presence of explanations. We also found that users had a significantly harder time agreeing on statement credibility without explanations. With explanations present, text and audio explanations were more effective than graphic explanations. Additionally, we found that combining graphical with text and/or audio explanations were significantly effective. Such combinations of modalities led to a higher user performance than using graphical explanations alone.

SESSION: Intelligent User Interfaces 2

Critiquing-based Modeling of Subjective Preferences

Alan Medlar, Jing Li, Yang Liu, Dorota Glowacka

Applications designed for entertainment and other non-instrumental purposes are challenging to optimize because the relationships between system parameters and user experience can be unclear. Ideally, we would crowdsource these design questions, but existing approaches are geared towards evaluation or ranking discrete choices and not for optimizing over continuous parameter spaces. In addition, users are accustomed to informally expressing opinions about experiences as critiques (e.g. it’s too cold, too spicy, too big), rather than giving precise feedback as an optimization algorithm would require. Unfortunately, it can be difficult to analyze qualitative feedback, especially in the context of quantitative modeling.

In this article, we present collective criticism, a critiquing-based approach for modeling relationships between system parameters and subjective preferences. We transform critiques, such as “it was too easy/too challenging”, into censored intervals and analyze them using interval regression. Collective criticism has several advantages over other approaches: “too much/too little”-style feedback is intuitive for users and allows us to build predictive models for the optimal parameterization of the variables being critiqued. We present two studies where we model: These studies demonstrate the flexibility of our approach, and show that it produces robust results that are straightforward to interpret and inline with users’ stated preferences.

I Cannot See Students Focusing on My Presentation; Are They Following Me? Continuous Monitoring of Student Engagement through “Stungage”

Snigdha Das, Sandip Chakraborty, Bivas Mitra

Monitoring students’ engagement and understanding their learning pace in a virtual classroom becomes challenging in the absence of direct eye contact between the students and the instructor. Continuous monitoring of eye gaze and gaze gestures may produce inaccurate outcomes when the students are allowed to do productive multitasking, such as taking notes or browsing relevant content. This paper proposes Stungage – a software wrapper over existing online meeting platforms to monitor students’ engagement in real-time by utilizing the facial video feeds from the students and the instructor coupled with a local on-device analysis of the presentation content. The crux of Stungage is to identify a few opportunistic moments when the students should visually focus on the presentation content if they can follow the lecture. We investigate these instances and analyze the students’ visual, contextual, and cognitive presence to assess their engagement during the virtual classroom while not directly sharing the video captures of the participants and their screens over the web. Our system achieves an overall F2-score of 0.88 for detecting student engagement. Besides, we obtain 92 responses from the usability study with an average SU score of 74.18.

Is More Always Better? The Effects of Personal Characteristics and Level of Detail on the Perception of Explanations in a Recommender System

Mohamed Amine Chatti, Mouadh Guesmi, Laura Vorgerd, Thao Ngo, Shoeb Joarder, Qurat Ul Ain, Arham Muslim

Despite the acknowledgment that the perception of explanations may vary considerably between end-users, explainable recommender systems (RS) have traditionally followed a one-size-fits-all model, whereby the same explanation level of detail is provided to each user, without taking into consideration individual user’s context, i.e., goals and personal characteristics. To fill this research gap, we aim in this paper at a shift from a one-size-fits-all to a personalized approach to explainable recommendation by giving users agency in deciding which explanation they would like to see. We developed a transparent Recommendation and Interest Modeling Application (RIMA) that provides on-demand personalized explanations of the recommendations, with three levels of detail (basic, intermediate, advanced) to meet the demands of different types of end-users. We conducted a within-subject study (N=31) to investigate the relationship between user’s personal characteristics and the explanation level of detail, and the effects of these two variables on the perception of the explainable RS with regard to different explanation goals. Our results show that the perception of explainable RS with different levels of detail is affected to different degrees by the explanation goal and user type. Consequently, we suggested some theoretical and design guidelines to support the systematic design of explanatory interfaces in RS tailored to the user’s context.

The Impact of Digital Nudging Techniques on the Formation of Self-Assembled Crowd Project Teams

Federica Vinella, Rosa Mosch, Ioanna Lykourentzou, Judith Masthoff

Self-assembling team formation systems, where online users can select their teammates, are gaining research and industry interest. Still, the benefits of diversity remain frequently untapped for these teams, as people tend to choose others similar to them. In this study, we examine whether making users aware of the team’s diversity can impact their selections. In a study involving 120 crowd participants, working on the scenario of a crowdsourced innovation project, we tested the effects of two choice architecture and nudging techniques. The first technique displayed explicit personalized diversity information in the form of the current team diversity score and diversity recommendations. The second technique used diversity priming, in the form of counter-stereotypes and all-inclusive multiculturalism. Our results indicate that, while priming deterred participants from picking teammates of different regions, displaying diversity information was the only factor to positively enhance diverse choices. These results were not moderated by the users’ ’‘need to belong” levels, an intrinsic motivation justifying one’s need to form social ties. Other factors which we also find to predict selection behavior were the participants’ region of origin, participants’ gender, teammates’ functional backgrounds, and teammates’ order of appearance. In light of these findings, we suggest that nudging techniques need to be cautiously applied to online team formation as the different techniques differ in their ability to evoke diversity among intrinsically diverse crowds, and that personalised displaying of diversity information seems most promising.

SESSION: Technology-enhanced Adaptive Learning

Automatic Evaluation of Aspects of Performance and Scheduling in Playing the Piano

Hila Tamir-Ostrover, Gilad Baruch, Or Peleg, Yonatan Yellin, Maor Rosenberg, Alexandra Moringen, Kathrin Krieger, Helge Ritter, Jason Friedman

There is a growing trend to teach playing an instrument such as a piano at home using an automated system. A key component of such systems is the ability to rate performance of the learner in order to provide feedback and select appropriate exercises. In this study, we expand on previous works that have developed automatic evaluation systems for an overall grade by also providing predictions for specific aspects of performance: pitch, rhythm, tempo, and articulation & dynamics, as well as scheduling what is an appropriate next task. We describe how a set of salient features is extracted by comparing MIDI performance data of three piano players to an ideal performance, how the features used for evaluation are selected, and evaluate using linear regression how well the selected features are able to predict the mean scores given by a group of domain experts (piano teachers). Relatively good R2 scores (0.54 to 0.68) are achieved using a small number of features (2 – 4). Such automatic evaluation of different aspects of performance can be used as a part of an automatic learning system, and to help provide learners with detailed feedback on their performance.

Towards Supporting Adaptive Training of Injection Procedures: Detecting Differences in the Visual Attention of Nursing Students and Experts

Minghao Cai, Bin Zheng, Carrie Demmans Epp

A large part of health education involves acquiring operational skills that require continued cooperation from the sense of sight. Traditional skill-training systems heavily rely on post-task evaluation and cannot automatically identify the tacit knowledge (e.g., visual attention allocation) that is used to support the decision-making and tool manipulation process. Recent research has started addressing this issue by identifying learner behaviours to understand learner needs and respond with personalized real-time support. To enable this type of adaptivity when training people in how to give injections, we conducted a study to investigate user’s visual attention. We analyzed the eye-tracking data of 8 skilled medical practitioners and 10 student nurses from when they were performing an injection task on a simulated abdominal pad. Differences in the detected visual attention and eye-scan patterns across expertise levels create an opportunity for using overlay models or other diagnosis approaches to provide adaptive support to nurse trainees.

What do Students’ Interactions with Online Lecture Videos Reveal about their Learning?

Guojing Zhou, Tetsumichi Umada, Sidney D’Mello

Video viewing is an important component of online learning, yet little is known about what information about learning outcomes can be derived from students’ video control actions. We investigate the extent to which information on student learning is contained in their video-watching clickstreams (e.g. pausing, playing) immediately after watching a video. We analyzed data from 10,492 students who used an online learning platform for their Algebra 1 course. Our experiments encode students’ video-control clickstreams into sequences in several ways (e.g. aggregate actions, shuffle actions, and merge action types), and train Long Short-term Memory (LSTM) neural network models to predict after-video quiz scores (N = 32,482) from the sequences in a student-independent fashion. The results suggest that the action sequences contain a limited amount of information about student learning (r = 0.108 between model-predicted- and actual- quiz scores), with most of the information in simple counts of actions (r = 0.081) rather than the temporal ordering of actions. Combining information from video action sequences and traditional knowledge estimates from item-response theory (IRT) outperformed (r = 0.224) either approach independently. Implications for student modeling and adaptive learning support for viewing lecture videos are discussed.

SESSION: Doctoral Consortium

Adaptive Emotional Support for Groups

Isabella Saccardi

When faced with a difficult situation, searching for emotional support is one of the most natural things to do. Being supported not only helps regulating the negative emotions, but it also promotes coping skills and psychological adjustment to stressful situations. However, not all emotional support attempts are effective and always available. The increasing usage of technology may offer a solution by providing an emotional support virtual agent, capable to deliver support via smartphone or computer, anytime and anywhere. Such agent may adapt to one’s characteristics and situation, providing supportive feedback tailored to the needs. In my thesis, I will focus on how to provide emotional support to groups of students. Emotional support will be adapted to the stressors that students typically encounter, and to the challenges linked to working in groups. I will study how people adapt emotional support statements to both individual and situational factors, and investigate how this can be implemented in a virtual agent.

Beyond-Accuracy Perspectives on Graph Neural Network-Based Models for Behavioural User Profiling

Erasmo Purificato

The presented doctoral research aims to develop a behavioural user profiling framework focusing simultaneously on three beyond-accuracy perspectives: privacy, to study how to intervene on graph data structures of specific contexts and provide methods to make the data available in a meaningful manner without neither exposing personal user information nor corrupting the profiles creation and system performances; fairness, to provide user representations that are free of any inherited discrimination which could affect a downstream recommender by developing debiasing approaches to be applied on state-of-the-art GNN-based user profiling models; explainability, to produce understandable descriptions of the framework results, both for user profiles and recommendations, mainly in terms of interaction importance, by designing an adaptive and personalised user interface which provides tailored explanations to the end-users, depending on their specific user profiles.

Interactive Interventions to Mitigate Cognitive Bias

Alisa Rieger

While the web offers a great potential to find and share information, the cognitively demanding conditions of online interactions can leave users vulnerable to cognitive biases, such as the confirmation bias – the tendency to favor information that confirms prior attitudes and beliefs when searching for, selecting, interpreting, sharing, and recalling information. This can negatively impact individuals’ decision-making and is likely to drive ideological polarization and extremism. With my dissertation, I am investigating whether and how interactive bias mitigation interventions, with a special focus on confirmation bias, could empower web users in making informed, unbiased, and autonomous choices. Based on my findings and observations, I plan to build a framework of user- and context-adaptive bias mitigation approaches during different kinds of web interactions.

KA-Recsys: Patient Focused Knowledge Appropriate Health Recommender System

Khushboo Thaker

Patients with chronic diseases, such as diabetes, cancer, and heart disease, actively participate in disease management and seek health information on a constant basis for decision-making and self management. Patient focused health recommender systems (PHRSs) that suggest health information relevant to patients’ changing information needs across their disease trajectory can provide significant help to patients as they manage their disease on a day-to-day basis. A unique requirement of the PHRS would be to suggest health information in line with patients’ changing knowledge about the disease. It is crucial to recognize that patient knowledge of the disease may change as they become more actively involved in understanding and self-managing the illness. By providing patients with appropriate information, they are more likely to not only understand and engage, but also learn. Hence, the purpose of this doctoral thesis is to explore technologies in the field of recommender systems and personalized learning for the purpose of suggesting health information that accounts for patients’ dynamic information needs and level of knowledge about disease. We will explore these ideas in the context of developing a knowledge-appropriate PHRS (KA-Recsys). As a case study, a recommender will be integrated to an existing ovarian cancer patients’ information access portal. To assess the utility of KA-Recsys, the system will be evaluated based on expert-based and patient-based feedback. The expectation is that health information suggested by KA-Recsys will increase as well as benefit patients’ involvement in self management and treatment decisions.

Personalized, context-aware communication in multimodal public transport

Anouk van Kasteren, Marloes Vredenborg

The way we experience time is based on its psychological value. Time spent efficiently or pleasantly positively influences the time experience. When choosing a transportation mode, people highly value flexibility, reliability, and autonomy. Public transport, although used frequently, does not always fulfill these values. With personalization, the public transport experience could become more efficient and pleasant. This Ph.D. research aims to personalize public transport information systems (PTIS) through context-aware communication. To achieve this goal, the following activities are planned: (1) a systematic literature review of personalization in public transport; (2) creating an elaborate passenger context model based on existing literature, expert reviews, and user studies; (3) applying this model to offer context-aware information, advice, and inspiration to public transport passengers. At the time of writing, an initial version of the context model has been produced. The next step will be validating and improving the model.

Personalizing Persuasive Principles to Improve Credibility

Felix Nti Koranteng

The role of credibility in persuasive systems is undoubted. It is therefore important that the determinants of credibility are clearly understood. Studies have asserted that personalizing persuasive principles, for instance, based on users’ personality, age, or gender is more effective in influencing behavior. However, there is little knowledge on how credibility determinants can be adapted to increase a system’s credibility and system persuasiveness for specific users. This paper describes ongoing research that attempts to categorize credibility determinants based on their effectiveness for different users and contexts. The paper presents the research plan and findings from three initial studies.

Scrutability of Intelligent Personal Assistants

Jovan Jeromela

Intelligent personal assistants (IPAs) have become widely available, yet they remain primarily used for discrete, straightforward tasks. By contrast, both user studies and literature reviews indicate that IPAs of the future are to be personalised, proactive, and capable of performing elaborate undertakings. Such systems would have to be based on complex and dynamic user and context models. We believe that scrutability – i.e. the ability of the user to actively study and modify the models towards tuning personalisation – could emerge as an essential element of such a human-assistant interaction paradigm. Yet, to the best of our knowledge, no work so far has investigated how the principles of scrutability, as presented in [21], relate to the context and novel challenges raised by the proactive IPAs and how scrutability could facilitate effort-efficient control of the assistants. This paper introduces our vision of the confluence of the research fields of IPAs and scrutability, presents a diagram of the proposed interaction structure, and reanalyses data from user studies originally presented in [11, 39] to better understand user expectations regarding scrutability and proactivity of IPAs.

Service-aware Recommendation and Justification of Results

Zhongli Filippo Hu

The opinions of people who previously experienced items are crucial to decision-making. My Ph.D. research project is focused on finding a better way to recommend experience goods and in particular services such as apartments and tourism experiences by exploiting a description of the service underlying item fruition, such as Service Journey Maps or Blueprint. Regarding the recommender system, I propose an extension of a Top-N algorithm that takes into account service-based dimensions. For the presentation of the results, I plan to develop an incremental view that holistically summarizes the items, showing quantitative data in bar graphs and qualitative data extracted from previous consumer feedback. As a testbed for the research, I exploited the home-booking domain, using publicly available data from Airbnb.

Supporting Group Decision-Making Processes based on Group Dynamics

Hanif Emamgholizadeh

Organizations, friends, family members, etc., construct groups and conduct discussions to make decisions about the problems. Group decision-making is a complicated and time-consuming process since the group members involved in a discussion may have different personalities and opinions about the potential options. In other words, ending up with a choice that satisfies all the group member is challenging. Hence, group recommender systems have emerged and attracted attention in the past two decades to support groups dealing with this difficulty. In fact, the main aims of group recommender systems are to help groups to end up with a good with limited costs, appropriate justifiability, and more fair choice. In this research, we aim to address three main questions regarding the group decision-making process and with respect to the main aims of group recommender systems: (i) how to predict groups’ choices, (ii) how to design a conversational system for groups, and (iii) how to support groups in a dynamic environment. Conversational group recommender systems can use the predicted group choice to interact more effectively with groups. For example, conversational group recommender systems can use this information by narrowing down the candidate set to the items that are not very different from the predicted choice and recommending them in each turn. Thereby, the system can help groups to reduce decision-making costs (time). Conversational GRSs are also able to capture the dynamics of groups or how the group preferences evolve during the discussion and interaction with the system since group members express their feedback to the system during their interaction with the system. The system can exploit this feedback to infer the evolution of the group. The supporting functionalities can use this information to support the groups in different group dynamic states. This research attempts to address these questions.

SESSION: Tutorials

Ethical Considerations in User Modeling and Personalization (ECUMAP): ACM UMAP 2022 Tutorial

Jim Torresen, Atsushi Nakazawa

Ethical considerations are getting increased attention with regards to providing responsible personalization for robots and autonomous systems. This is partly a result of the currently limited deployment of such systems in human support and interaction settings. There are many different ethical considerations, and it is important to identify those relevant to one’s own work within user modelling and personalization. The tutorial paper will give an overview of the most commonly expressed ethical challenges and ways being undertaken to reduce their impact using the findings in an earlier undertaken review supplemented with recent work and initiatives. That includes the identified challenges in a “Statement on research ethics in artificial intelligence”.

The tutorial related to this paper exemplified the challenges related to privacy, security and safety through several examples from own and others’ work. With tutorial presenters from two different continents (Asia and Europe), the tutorial covered the views on ethical assessment across countries and cultures. Thus, it illustrated to what extent perspectives are similar and what are regarded as the most important ethical aspects in the two countries.

Semantics-aware Content Representations for Reproducible Recommender Systems (SCoRe)

Pasquale Lops, Cataldo Musto, Marco Polignano

In the traditional categorization of recommendation techniques, content-based methods are often considered as an alternative to the most widely adopted collaborative filtering approaches. Content-based recommender systems suggest items similar to a user profile by matching attributes obtained by processing textual content. In order to deal with natural language ambiguity, semantics-aware representations can help to build more precise representations of users and items, and, in turn, to generate better recommendations. This tutorial (i) presents the most recent trends in the area of semantics-aware content-based recommender systems, including novel representation methods based on knowledge graphs and embedding techniques, (ii) discusses how to implement reproducible pipelines for semantics-aware recommender systems, and (iii) presents a new and comprehensive Python framework called ClayRS to deal with semantics-aware recommender systems.

User Control in Adaptive Information Access

Peter Brusilovsky

Adaptive information access systems is one of the most popular types of adaptive systems. It includes adaptive search, recommender system, adaptive navigation support, and adaptive information visualization. This is also one of the oldest types of adaptive systems where the ideas of user control were explored. Motivated by a timely selection of user control as the theme of UMAP 2022, this tutorial provides a systematic review of approaches focused on adding various kinds of user control to adaptive information access systems and discuss lessons learned, prospects, and challenges of this direction of research