搜索 — ResearchTracker

Agentic benchmarks increasingly rely on LLM-simulated users to scalably evaluate agent performance, yet the robustness, validity, and fairness of this approach remain unexamined. Through a user study with participants across the United States, India, Kenya, and Nigeria, we investigate whether LLM-simulated users serve as reliable proxies for real human users in evaluating agents on τ-Bench retail tasks. We find that user simulation lacks robustness, with agent success rates varying up to 9 percentage points across different user LLMs. Furthermore, evaluations using simulated users exhibit systematic miscalibration, underestimating agent performance on challenging tasks and overestimating it on moderately difficult ones. African American Vernacular English (AAVE) speakers experience consistently worse success rates and calibration errors than Standard American English (SAE) speakers, with disparities compounding significantly with age. We also find simulated users to be a differentially effective proxy for different populations, performing worst for AAVE and Indian English speakers. Additionally, simulated users introduce conversational artifacts and surface different failure patter

User-Driven Value Alignment: Understanding Users' Perceptions and Strategies for Addressing Biased and Discriminatory Statements in AI Companions

arXiv2024-09-01作者：Xianzhe Fan, Qing Xiao, Xuhui Zhou

Large language model-based AI companions are increasingly viewed by users as friends or romantic partners, leading to deep emotional bonds. However, they can generate biased, discriminatory, and harmful outputs. Recently, users are taking the initiative to address these harms and re-align AI companions. We introduce the concept of user-driven value alignment, where users actively identify, challenge, and attempt to correct AI outputs they perceive as harmful, aiming to guide the AI to better align with their values. We analyzed 77 social media posts about discriminatory AI statements and conducted semi-structured interviews with 20 experienced users. Our analysis revealed six common types of discriminatory statements perceived by users, how users make sense of those AI behaviors, and seven user-driven alignment strategies, such as gentle persuasion and anger expression. We discuss implications for supporting user-driven value alignment in future AI systems, where users and their communities have greater agency.

搜索结果：Users

Lost in Simulation: LLM-Simulated Users are Unreliable Proxies for Human Users in Agentic Evaluations

User-Driven Value Alignment: Understanding Users' Perceptions and Strategies for Addressing Biased and Discriminatory Statements in AI Companions

Analysis of User Experience Evaluation Methods for Deaf users: A Case Study on a mobile App

Recommending to Strategic Users

Users volatility on Reddit and Voat

Multi-User Diversity with Random Number of Users

Balanced Co-Clustering of Users and Items for Embedding Table Compression in Recommender Systems

Dedicating Cellular Infrastructure for Aerial Users: Advantages and Potential Impact on Ground Users

Accessible Capacity of Secondary Users

Algorithmic Transparency with Strategic Users

Characterizing and Detecting Hateful Users on Twitter

Key User Extraction Based on Telecommunication Data (aka. Key Users in Social Network. How to find them?)

Show Me My Users: A Dashboard Visualizing User Interaction Logs

Leave No User Behind: Towards Improving the Utility of Recommender Systems for Non-mainstream Users

User-to-User Interference Mitigation in Dynamic TDD MIMO Systems with Multi-Antenna Users

Multi-User MultiWOZ: Task-Oriented Dialogues among Multiple Users

Precoding Design for Multi-user MIMO Systems with Delay-Constrained and -Tolerant Users

Spectrum Sharing Scheme Between Cellular Users and Ad-hoc Device-to-Device Users

Enabling More Users to Benefit from Near-Field Communications: From Linear to Circular Array

Measuring Individual User Fairness with User Similarity and Effectiveness Disparity