Agentic benchmarks increasingly rely on LLM-simulated users to scalably evaluate agent performance, yet the robustness, validity, and fairness of this approach remain unexamined. Through a user study with participants across the United States, India, Kenya, and Nigeria, we investigate whether LLM-simulated users serve as reliable proxies for real human users in evaluating agents on τ-Bench retail tasks. We find that user simulation lacks robustness, with agent success rates varying up to 9 percentage points across different user LLMs. Furthermore, evaluations using simulated users exhibit systematic miscalibration, underestimating agent performance on challenging tasks and overestimating it on moderately difficult ones. African American Vernacular English (AAVE) speakers experience consistently worse success rates and calibration errors than Standard American English (SAE) speakers, with disparities compounding significantly with age. We also find simulated users to be a differentially effective proxy for different populations, performing worst for AAVE and Indian English speakers. Additionally, simulated users introduce conversational artifacts and surface different failure patter
Large language model-based AI companions are increasingly viewed by users as friends or romantic partners, leading to deep emotional bonds. However, they can generate biased, discriminatory, and harmful outputs. Recently, users are taking the initiative to address these harms and re-align AI companions. We introduce the concept of user-driven value alignment, where users actively identify, challenge, and attempt to correct AI outputs they perceive as harmful, aiming to guide the AI to better align with their values. We analyzed 77 social media posts about discriminatory AI statements and conducted semi-structured interviews with 20 experienced users. Our analysis revealed six common types of discriminatory statements perceived by users, how users make sense of those AI behaviors, and seven user-driven alignment strategies, such as gentle persuasion and anger expression. We discuss implications for supporting user-driven value alignment in future AI systems, where users and their communities have greater agency.
User Experience (UX) evaluation methods that are commonly used with hearing users may not be functional or effective for Deaf users. This is because these methods are primarily designed for users with hearing abilities, which can create limitations in the interaction, perception, and understanding of the methods for Deaf individuals. Furthermore, traditional UX evaluation approaches often fail to address the unique accessibility needs of Deaf users, resulting in an incomplete or biased assessment of their user experience. This research focused on analyzing a set of UX evaluation methods recommended for use with Deaf users, with the aim of validating the accessibility of each method through findings and limitations. The results indicate that, although these evaluation methods presented here are commonly recommended in the literature for use with Deaf users, they present various limitations that must be addressed in order to better adapt to the communication skills specific to the Deaf community. This research concludes that evaluation methods must be adapted to ensure accessible software evaluation for Deaf individuals, enabling the collection of data that accurately reflects their
Recommendation systems are pervasive in the digital economy. An important assumption in many deployed systems is that user consumption reflects user preferences in a static sense: users consume the content they like with no other considerations in mind. However, as we document in a large-scale online survey, users do choose content strategically to influence the types of content they get recommended in the future. We model this user behavior as a two-stage noisy signalling game between the recommendation system and users: the recommendation system initially commits to a recommendation policy, presents content to the users during a cold start phase which the users choose to strategically consume in order to affect the types of content they will be recommended in a recommendation phase. We show that in equilibrium, users engage in behaviors that accentuate their differences to users of different preference profiles. In addition, (statistical) minorities out of fear of losing their minority content exposition may not consume content that is liked by mainstream users. We next propose three interventions that may improve recommendation quality (both on average and for minorities) when t
Social media platforms are like giant arenas where users can rely on different content and express their opinions through likes, comments, and shares. However, do users welcome different perspectives or only listen to their preferred narratives? This paper examines how users explore the digital space and allocate their attention among communities on two social networks, Voat and Reddit. By analysing a massive dataset of about 215 million comments posted by about 16 million users on Voat and Reddit in 2019 we find that most users tend to explore new communities at a decreasing rate, meaning they have a limited set of preferred groups they visit regularly. Moreover, we provide evidence that preferred communities of users tend to cover similar topics throughout the year. We also find that communities have a high turnover of users, meaning that users come and go frequently showing a high volatility that strongly departs from a null model simulating users' behaviour.
Multi-user diversity is considered when the number of users in the system is random. The complete monotonicity of the error rate as a function of the (deterministic) number of users is established and it is proved that randomization of the number of users always leads to deterioration of average system performance at any average SNR. Further, using stochastic ordering theory, a framework for comparison of system performance for different user distributions is provided. For Poisson distributed users, the difference in error rate of the random and deterministic number of users cases is shown to asymptotically approach zero as the average number of users goes to infinity for any fixed average SNR. In contrast, for a finite average number of users and high SNR, it is found that randomization of the number of users deteriorates performance significantly, and the diversity order under fading is dominated by the smallest possible number of users. For Poisson distributed users communicating over Rayleigh faded channels, further closed-form results are provided for average error rate, and the asymptotic scaling law for ergodic capacity is also provided. Simulation results are provided to co
Recommender systems have advanced markedly over the past decade by transforming each user/item into a dense embedding vector with deep learning models. At industrial scale, embedding tables constituted by such vectors of all users/items demand a vast amount of parameters and impose heavy compute and memory overhead during training and inference, hindering model deployment under resource constraints. Existing solutions towards embedding compression either suffer from severely compromised recommendation accuracy or incur considerable computational costs. To mitigate these issues, this paper presents BACO, a fast and effective framework for compressing embedding tables. Unlike traditional ID hashing, BACO is built on the idea of exploiting collaborative signals in user-item interactions for user and item groupings, such that similar users/items share the same embeddings in the codebook. Specifically, we formulate a balanced co-clustering objective that maximizes intra-cluster connectivity while enforcing cluster-volume balance, and unify canonical graph clustering techniques into the framework through rigorous theoretical analyses. To produce effective groupings while averting codeboo
A new generation of aerial vehicles is hopeful to be the next frontier for the transportation of people and goods, becoming even as important as ground users in the communication systems. To enhance the coverage of aerial users, appropriate adjustments should be made to the existing cellular networks that mainly provide services for ground users by the down-tilted antennas of the terrestrial base stations (BSs). It is promising to up-tilt the antennas of a subset of BSs for serving aerial users through the mainlobe. With this motivation, in this work, we use tools from stochastic geometry to analyze the coverage performance of the adjusted cellular network (consisting of the up-tilted BSs and the down-tilted BSs). Correspondingly, we present exact and approximate expressions of the signal-to-interference ratio (SIR)-based coverage probabilities for users in the sky and on the ground, respectively. Numerical results verify the analysis accuracy and clarify the advantages of up-tilting BS antennas on the communication connectivity of aerial users without the potential adverse impact on the quality of service (QoS) of ground users.
A new problem formulation is presented for the Gaussian interference channels (GIFC) with two pairs of users, which are distinguished as primary users and secondary users, respectively. The primary users employ a pair of encoder and decoder that were originally designed to satisfy a given error performance requirement under the assumption that no interference exists from other users. In the scenario when the secondary users attempt to access the same medium, we are interested in the maximum transmission rate (defined as {\em accessible capacity}) at which secondary users can communicate reliably without affecting the error performance requirement by the primary users under the constraint that the primary encoder (not the decoder) is kept unchanged. By modeling the primary encoder as a generalized trellis code (GTC), we are then able to treat the secondary link and the cross link from the secondary transmitter to the primary receiver as finite state channels (FSCs). Based on this, upper and lower bounds on the accessible capacity are derived. The impact of the error performance requirement by the primary users on the accessible capacity is analyzed by using the concept of interferen
Should firms that apply machine learning algorithms in their decision-making make their algorithms transparent to the users they affect? Despite growing calls for algorithmic transparency, most firms have kept their algorithms opaque, citing potential gaming by users that may negatively affect the algorithm's predictive power. We develop an analytical model to compare firm and user surplus with and without algorithmic transparency in the presence of strategic users and present novel insights. We identify a broad set of conditions under which making the algorithm transparent benefits the firm. We show that, in some cases, even the predictive power of machine learning algorithms may increase if the firm makes them transparent. By contrast, users may not always be better off under algorithmic transparency. The results hold even when the predictive power of the opaque algorithm comes largely from correlational features and the cost for users to improve on them is close to zero. Overall, our results show that firms should not view manipulation by users as bad. Rather, they should use algorithmic transparency as a lever to motivate users to invest in more desirable features.
Most current approaches to characterize and detect hate speech focus on \textit{content} posted in Online Social Networks. They face shortcomings to collect and annotate hateful speech due to the incompleteness and noisiness of OSN text and the subjectivity of hate speech. These limitations are often aided with constraints that oversimplify the problem, such as considering only tweets containing hate-related words. In this work we partially address these issues by shifting the focus towards \textit{users}. We develop and employ a robust methodology to collect and annotate hateful users which does not depend directly on lexicon and where the users are annotated given their entire profile. This results in a sample of Twitter's retweet graph containing $100,386$ users, out of which $4,972$ were annotated. We also collect the users who were banned in the three months that followed the data collection. We show that hateful users differ from normal ones in terms of their activity patterns, word usage and as well as network structure. We obtain similar results comparing the neighbors of hateful vs. neighbors of normal users and also suspended users vs. active users, increasing the robustn
The number of systems that collect vast amount of data about users rapidly grow during last few years. Many of these systems contain data not only about people characteristics but also about their relationships with other system users. From this kind of data it is possible to extract a social network that reflects the connections between system's users. Moreover, the analysis of such social network enables to investigate different characteristics of its members and their linkages. One of the types of examining such network is key users extraction. Key users are these who have the biggest impact on other network members as well as have big influence on network evolution. The obtained about these users knowledge enables to investigate and predict changes within the network. So this knowledge is very important for the people or companies who make a profit from the network like telecommunication company. The second important thing is the ability to extract these users as quick as possible, i.e. developed the algorithm that will be time-effective in large social networks where number of nodes and edges equal few millions. In this master thesis the method of key user extraction, which is
This paper describes the design of a dashboard and analysis pipeline to monitor users of visualization tools in the wild. Our pipeline describes how to extract analytical KPIs from extensive log event data involving a mix of user types. The resulting three-page dashboard displays live KPIs, helping analysts understand users, detect exploratory behaviors, plan education interventions, and improve tool features. We propose this case study as a motivation to use the dashboard approach for a more `casual' monitoring of users and building carer mindsets for visualization tools.
In a collaborative-filtering recommendation scenario, biases in the data will likely propagate in the learned recommendations. In this paper we focus on the so-called mainstream bias: the tendency of a recommender system to provide better recommendations to users who have a mainstream taste, as opposed to non-mainstream users. We propose NAECF, a conceptually simple but effective idea to address this bias. The idea consists of adding an autoencoder (AE) layer when learning user and item representations with text-based Convolutional Neural Networks. The AEs, one for the users and one for the items, serve as adversaries to the process of minimizing the rating prediction error when learning how to recommend. They enforce that the specific unique properties of all users and items are sufficiently well incorporated and preserved in the learned representations. These representations, extracted as the bottlenecks of the corresponding AEs, are expected to be less biased towards mainstream users, and to provide more balanced recommendation utility across all users. Our experimental results confirm these expectations, significantly improving the recommendations for non-mainstream users while
We propose a novel method for user-to-user interference (UUI) mitigation in dynamic time-division duplex multiple-input multiple-output communication systems with multi-antenna users. Specifically, we consider the downlink data transmission in the presence of UUI caused by a user that simultaneously transmits in uplink. Our method introduces an overhead for estimation of the user-to-user channels by transmitting pilots from the uplink user to the downlink users. Each downlink user obtains a channel estimate that is used to design a combining matrix for UUI mitigation. We analytically derive an achievable spectral efficiency for the downlink transmission in the presence of UUI with our mitigation technique. Through numerical simulations, we show that our method can significantly improve the spectral efficiency performance in cases of heavy UUI.
While most task-oriented dialogues assume conversations between the agent and one user at a time, dialogue systems are increasingly expected to communicate with multiple users simultaneously who make decisions collaboratively. To facilitate development of such systems, we release the Multi-User MultiWOZ dataset: task-oriented dialogues among two users and one agent. To collect this dataset, each user utterance from MultiWOZ 2.2 was replaced with a small chat between two users that is semantically and pragmatically consistent with the original user utterance, thus resulting in the same dialogue state and system response. These dialogues reflect interesting dynamics of collaborative decision-making in task-oriented scenarios, e.g., social chatter and deliberation. Supported by this data, we propose the novel task of multi-user contextual query rewriting: to rewrite a task-oriented chat between two users as a concise task-oriented query that retains only task-relevant information and that is directly consumable by the dialogue system. We demonstrate that in multi-user dialogues, using predicted rewrites substantially improves dialogue state tracking without modifying existing dialogue
In both academia and industry, multi-user multiple-input multiple-output (MU-MIMO) techniques have shown enormous gains in spectral efficiency by exploiting spatial degrees of freedom. So far, an underlying assumption in most of the existing MU-MIMO design has been that all the users use infinite blocklength, so that they can achieve the Shannon capacity. This setup, however, is not suitable considering delay-constrained users whose blocklength tends to be finite. In this paper, we consider a heterogeneous setting in MU-MIMO systems where delay-constrained users and delay-tolerant users coexist, called a DCTU-MIMO network. To maximize the sum spectral efficiency in this system, we present the spectral efficiency for delay-tolerant users and provide a lower bound of the spectral efficiency for delay-constrained users. We consider an optimization problem that maximizes the sum spectral efficiency of delay-tolerant users while satisfying the latency constraint of delay-constrained users, and propose a generalized power iteration (GPI) precoding algorithm that finds a principal precoding vector. Furthermore, we extend a DCTU-MIMO network to the multiple time slots scenario and propose
In an attempt to utilize spectrum resources more efficiently, protocols sharing licensed spectrum with unlicensed users are receiving increased attention. From the perspective of cellular networks, spectrum underutilization makes spatial reuse a feasible complement to existing standards. Interference management is a major component in designing these schemes as it is critical that licensed users maintain their expected quality of service. We develop a distributed dynamic spectrum protocol in which ad-hoc device-to-device users opportunistically access the spectrum actively in use by cellular users. First, channel gain estimates are used to set feasible transmit powers for device-to-device users that keeps the interference they cause within the allowed interference temperature. Then network information is distributed by route discovery packets in a random access manner to help establish either a single-hop or multi-hop route between two device-to-device users. We show that network information in the discovery packet can decrease the failure rate of the route discovery and reduce the number of necessary transmissions to find a route. Using the found route, we show that two device-to-
Massive multiple-input multiple-output (MIMO) for 5G is evolving into the extremely large-scale antenna array (ELAA) to increase the spectrum efficiency by orders of magnitude for 6G communications. ELAA introduces spherical-wave-based near-field communications, where channel capacity can be significantly improved for single-user and multi-user scenarios. Unfortunately, the near-field region at large incidence/emergence angles is greatly reduced with the widely studied uniform linear array (ULA). Thus, many randomly distributed users may fail to benefit from near-field communications. In this paper, we leverage the rotational symmetry of uniform circular array (UCA) to provide uniform and enlarged near-field regions at all angles, enabling more users to benefit from near-field communications. Specifically, by exploiting the geometrical relationship between UCA and users, the near-field beamforming technique for UCA is developed. Based on the analysis of near-field beamforming, we reveal that UCA is able to provide a larger near-field region than ULA in terms of the effective Rayleigh distance. Moreover, a concentric-ring codebook is designed to realize efficient codebook-based beam
Individual user fairness is commonly understood as treating similar users similarly. In Recommender Systems (RSs), several evaluation measures exist for quantifying individual user fairness. These measures evaluate fairness via either: (i) the disparity in RS effectiveness scores regardless of user similarity, or (ii) the disparity in items recommended to similar users regardless of item relevance. Both disparity in recommendation effectiveness and user similarity are very important in fairness, yet no existing individual user fairness measure simultaneously accounts for both. In brief, current user fairness evaluation measures implement a largely incomplete definition of fairness. To fill this gap, we present Pairwise User unFairness (PUF), a novel evaluation measure of individual user fairness that considers both effectiveness disparity and user similarity. PUF is the only measure that can express this important distinction. We empirically validate that PUF does this consistently across 4 datasets and 7 rankers, and robustly when varying user similarity or effectiveness. In contrast, all other measures are either almost insensitive to effectiveness disparity or completely insensi