共找到 20 条结果
Sycophancy in language models is typically studied as excessive agreement or validation, while explicit praise and flattery have received comparatively little attention. We argue that sycophantic praise is a distinct alignment problem that cannot be reliably measured using current methods. We introduce a parameterized framework that measures whether praise is excessive relative to contribution quality and expected user ability. We show that our framework substantially outperforms generic LLM judges in agreement with human annotations, and that sycophantic praise occurs far more frequently in social and interpretive domains than in objective reasoning settings. Together, these findings position praise calibration as a distinct alignment challenge.
Accurate and complete product descriptions are crucial for e-commerce, yet seller-provided information often falls short. Customer reviews offer valuable details but are laborious to sift through manually. We present PRAISE: Product Review Attribute Insight Structuring Engine, a novel system that uses Large Language Models (LLMs) to automatically extract, compare, and structure insights from customer reviews and seller descriptions. PRAISE provides users with an intuitive interface to identify missing, contradictory, or partially matching details between these two sources, presenting the discrepancies in a clear, structured format alongside supporting evidence from reviews. This allows sellers to easily enhance their product listings for clarity and persuasiveness, and buyers to better assess product reliability. Our demonstration showcases PRAISE's workflow, its effectiveness in generating actionable structured insights from unstructured reviews, and its potential to significantly improve the quality and trustworthiness of e-commerce product catalogs.
Ensuring that collective adaptive systems remain safe, reliable, and trustworthy requires measures that transcend so far established formal methods, and in particular established verification techniques. In this contribution, we suggest three such measures: (1) conceptual means: runs with locally confined cause and effect of events, (2) temporal logic like verification techniques that respect and exploit such runs, (3) composing system properties from properties of components. This contribution presents a case study which particularly focuses on the benefits of modularization for achieving trust by design. Further work will develop a full-fledged theory for the presented ideas.
As large language models (LLMs) are increasingly used for work, personal, and therapeutic purposes, researchers have begun to investigate these models' implicit and explicit moral views. Previous work, however, focuses on asking LLMs to state opinions, or on other technical evaluations that do not reflect common user interactions. We propose a novel evaluation of LLM behavior that analyzes responses to user-stated intentions, such as "I'm thinking of campaigning for {candidate}." LLMs frequently respond with critiques or praise, often beginning responses with phrases such as "That's great to hear!..." While this makes them friendly, these praise responses are not universal and thus reflect a normative stance by the LLM. We map out the moral landscape of LLMs in how they respond to user statements in different domains including politics and everyday ethical actions. In particular, although a naïve analysis might suggest LLMs are biased against right-leaning politics, our findings on news sources indicate that trustworthiness is a stronger driver of praise and critique than ideology. Second, we find strong alignment across models in response to ethically-relevant action statements, b
Through systematic empirical investigation, we uncover a fundamental and concerning property of Large Language Models: while they can safely learn facts that don't contradict their knowledge, attempting to update facts with contradictory information triggers catastrophic corruption of unrelated knowledge. Unlike humans, who naturally resist contradictory information, these models indiscriminately accept contradictions, leading to devastating interference, destroying up to 80% of unrelated knowledge even when learning as few as 10-100 contradicting facts. To understand whether this interference could be mitigated through selective plasticity, we experiment with targeted network updates, distinguishing between previously used (stubborn) and rarely used (plastic) neurons. We uncover another asymmetry: while sparing frequently-used neurons significantly improves retention of existing knowledge for non-contradictory updates (98% vs 93% with standard updates), contradictory updates trigger catastrophic interference regardless of targeting strategy. This effect which persists across tested model scales (GPT-2 to GPT-J-6B), suggests a fundamental limitation in how neural networks handle co
Heroes are people who perform costly altruistic acts. Few people turn out to be heroes, but many spontaneously honor heroes by commenting, applauding, or enthusiastically celebrating their deeds. The existence of a praising audience leads individuals to compete to attract the crowd's admiration. The outcome is a winner-take-all situation in which only one or a few individuals engage in extreme altruistic behavior. The more difficult part is to explain the crowd's propensity to pay tribute from an individual fitness optimization perspective. The model proposed here shows how heroic behavior and its celebration by a large audience may emerge together. This situation is possible if admirers use public praise as a social signal to promote their own commitment to the values displayed by the hero.
Research suggests that providing specific and timely feedback to human tutors enhances their performance. However, it presents challenges due to the time-consuming nature of assessing tutor performance by human evaluators. Large language models, such as the AI-chatbot ChatGPT, hold potential for offering constructive feedback to tutors in practical settings. Nevertheless, the accuracy of AI-generated feedback remains uncertain, with scant research investigating the ability of models like ChatGPT to deliver effective feedback. In this work-in-progress, we evaluate 30 dialogues generated by GPT-4 in a tutor-student setting. We use two different prompting approaches, the zero-shot chain of thought and the few-shot chain of thought, to identify specific components of effective praise based on five criteria. These approaches are then compared to the results of human graders for accuracy. Our goal is to assess the extent to which GPT-4 can accurately identify each praise criterion. We found that both zero-shot and few-shot chain of thought approaches yield comparable results. GPT-4 performs moderately well in identifying instances when the tutor offers specific and immediate praise. Howe
While speed is an ubiquitous concept in physics, its inverse - known as slowness - sometimes proves more relevant. We discuss some case studies within classical physics where such a notion is fruitful, before exploring how it can be realized within the relativistic and thermodynamical frameworks. This leads us to a generalized slowness, not specifically defined in relation with speed but as characterizing the evolution of any physical parameter. We advocate the pedagogical relevance of slowness, by analogy with other couples of quantities, inverse of each other, commonly used in physics.
Fusing measurements from multiple, heterogeneous, partial sources, observing a common object or process, poses challenges due to the increasing availability of numbers and types of sensors. In this work we propose, implement and validate an end-to-end computational pipeline in the form of a multiple-auto-encoder neural network architecture for this task. The inputs to the pipeline are several sets of partial observations, and the result is a globally consistent latent space, harmonizing (rigidifying, fusing) all measurements. The key enabler is the availability of multiple slightly perturbed measurements of each instance:, local measurement, "bursts", that allows us to estimate the local distortion induced by each instrument. We demonstrate the approach in a sequence of examples, starting with simple two-dimensional data sets and proceeding to a Wi-Fi localization problem and to the solution of a "dynamical puzzle" arising in spatio-temporal observations of the solutions of Partial Differential Equations.
Understanding user satisfaction with conversational systems, known as User Satisfaction Estimation (USE), is essential for assessing dialogue quality and enhancing user experiences. However, existing methods for USE face challenges due to limited understanding of underlying reasons for user dissatisfaction and the high costs of annotating user intentions. To address these challenges, we propose PRAISE (Plan and Retrieval Alignment for Interpretable Satisfaction Estimation), an interpretable framework for effective user satisfaction prediction. PRAISE operates through three key modules. The Strategy Planner develops strategies, which are natural language criteria for classifying user satisfaction. The Feature Retriever then incorporates knowledge on user satisfaction from Large Language Models (LLMs) and retrieves relevance features from utterances. Finally, the Score Analyzer evaluates strategy predictions and classifies user satisfaction. Experimental results demonstrate that PRAISE achieves state-of-the-art performance on three benchmarks for the USE task. Beyond its superior performance, PRAISE offers additional benefits. It enhances interpretability by providing instance-level
This paper extends implication-space semantics to include first-order quantification. Implication-space semantics has recently been introduced as an inferentialist formal semantics that can capture nonmonotonic and nontransitive material inferences. Extant versions, however, include only propositional logic. This paper extends the framework so as to recover classical first-order logic. The goal is to formulate a theory in which consequence relations can be nonmonotonic and supraclassical, while obeying the deduction-detachment theorem and disjunction simplification, while also including conjunctions that behave multiplicatively as premises and counterexamples to the usual quantifier rules. The paper explains these constraints and shows how they can be met jointly. The result is a first-order version of implication-space semantics that has all the virtues for which inferentialists and inferential expressivists praise propositional implication-space semantics.
Families raising children with ADHD often experience heightened stress and reactive parenting. While digital interventions promise personalization, many remain one-size-fits-all and fail to reflect parents' lived practices. We present CalmReminder, a watch-based system that detects children's calm moments and delivers just-in-time prompts to parents. Through a four-week deployment with 16 families (twelve completed) of children with ADHD, we compared notification strategies ranging from hourly to random to only when the child was inferred to be calm. Our sensing-based notifications were frequently perceived as arriving during calm moments. More importantly, parents adopted the system in diverse ways: using notifications for praise, mindfulness, activity planning, or conversation. These findings show that parents are not passive recipients but active designers, reshaping interventions to fit their parenting styles. We contribute a calm detection pipeline, empirical insights into families' flexible appropriation of notifications, and design implications for intervention systems that foster agency.
Conversational Question Answering (ConvQA) involves multiple subtasks, i) to understand incomplete questions in their context, ii) to retrieve relevant information, and iii) to generate answers. This work presents PRAISE, a pipeline-based approach for ConvQA that trains LLM adapters for each of the three subtasks. As labeled training data for individual subtasks is unavailable in practice, PRAISE learns from its own generations using the final answering performance as feedback signal without human intervention and treats intermediate information, like relevant evidence, as weakly labeled data. We apply Direct Preference Optimization by contrasting successful and unsuccessful samples for each subtask. In our experiments, we show the effectiveness of this training paradigm: PRAISE shows improvements per subtask and achieves new state-of-the-art performance on a popular ConvQA benchmark, by gaining 15.5 percentage points increase in precision over baselines.
Tutoring is an effective instructional method for enhancing student learning, yet its success relies on the skill and experience of the tutors. This reliance presents challenges for the widespread implementation of tutoring, particularly in training novice tutors. To support tutor training programs, real-time automated feedback systems are essential for efficiently training large numbers of tutors. Lin et al.'s previous study employed Generative Pre-Trained Transformers (GPT) for sequence labeling to identify desirable and undesirable praise components in a tutor training dataset, providing explanatory feedback. However, this approach requires a significant amount of labeled data for fine-tuning, which is both labor-intensive and dependent on expert input. To address the challenges associated with extensive data labeling, the current study explores the use of prompting more advanced GPT models like GPT-4o to generate synthetic datasets for augmenting labeled response data, followed by fine-tuning a GPT-3.5 model. Our results demonstrate that our data augmentation approach generalizes effectively to identify other types of praise, compared to the same model fine-tuned without augmen
Tutoring improves student achievement, but identifying and studying what tutoring actions are most associated with student learning at scale based on audio transcriptions is an open research problem. This present study investigates the feasibility and scalability of using generative AI to identify and evaluate specific tutor moves in real-life math tutoring. We analyze 50 randomly selected transcripts of college-student remote tutors assisting middle school students in mathematics. Using GPT-4, GPT-4o, GPT-4-turbo, Gemini-1.5-pro, and LearnLM, we assess tutors' application of two tutor skills: delivering effective praise and responding to student math errors. All models reliably detected relevant situations, for example, tutors providing praise to students (94-98% accuracy) and a student making a math error (82-88% accuracy) and effectively evaluated the tutors' adherence to tutoring best practices, aligning closely with human judgments (83-89% and 73-77%, respectively). We propose a cost-effective prompting strategy and discuss practical implications for using large language models to support scalable assessment in authentic settings. This work further contributes LLM prompts to s
This study investigates the use of generative AI to support formative assessment through machine generated reviews of peer reviews in graduate online courses in a public university in the United States. Drawing on Systemic Functional Linguistics and Appraisal Theory, we analyzed 120 metareviews to explore how generative AI feedback constructs meaning across ideational, interpersonal, and textual dimensions. The findings suggest that generative AI can approximate key rhetorical and relational features of effective human feedback, offering directive clarity while also maintaining a supportive stance. The reviews analyzed demonstrated a balance of praise and constructive critique, alignment with rubric expectations, and structured staging that foregrounded student agency. By modeling these qualities, AI metafeedback has the potential to scaffold feedback literacy and enhance leaner engagement with peer review.
Large language models (LLMs) often exhibit sycophantic behaviors -- such as excessive agreement with or flattery of the user -- but it is unclear whether these behaviors arise from a single mechanism or multiple distinct processes. We decompose sycophancy into sycophantic agreement and sycophantic praise, contrasting both with genuine agreement. Using difference-in-means directions, activation additions, and subspace geometry across multiple models and datasets, we show that: (1) the three behaviors are encoded along distinct linear directions in latent space; (2) each behavior can be independently amplified or suppressed without affecting the others; and (3) their representational structure is consistent across model families and scales. These results suggest that sycophantic behaviors correspond to distinct, independently steerable representations.
Inferring the sign of social relationships from online interactions is a fundamental challenge in social network analysis. Existing approaches typically rely on sentiment analysis to label individual interactions as positive or negative, then aggregate these labels to assign a sign to the relationship. However, sentiment analysis captures the valence of the content being discussed rather than the nature of the relational exchange itself, a conflation that can lead to systematic misclassification. In this paper, we propose a methodology that addresses this limitation by leveraging Large Language Models (LLMs) in a zero-shot setting to identify interaction-level relational signals (specifically, personal praise and personal attacks directed at the interlocutor) as more direct indicators of positive and negative social ties. We evaluate four models spanning open-weight and proprietary architectures (Qwen2.5:7b, Gemma2:9b, GPT-4o, GPT-5.4-mini) across three prompt designs of increasing complexity, on two human-annotated datasets of approximately 298 and 340 texts respectively. Results show that zero-shot LLMs achieve good classification performance on both tasks without any task-specif
Let $X$ be a projective toric variety of dimension $n$ and let $L$ be a ample line bundle on $X$. For $k \geq 0$, it is in general difficult to determine whether $L^{\otimes k}$ is very ample and whether it additionally gives a projectively normal embedding. These two properties are equivalent to the very ampleness, respectively normality, of the corresponding polytope. By a result of Ewald-Wessels, both statements are classically known to hold for $k \geq n - 1$. We study embeddings of weighted projective spaces $\mathbb{P}(a_0, ..., a_n)$ via their corresponding rectangular simplices $Δ(λ_1, ..., λ_n)$. We give multiple criteria (depending on arithmetic properties of the weights $a_i$) to obtain bounds for the power $k$ which are sharp in many cases. We also introduce combinatorial tools that allow us to systematically construct families exhibiting extremal behaviour. These results extend earlier work of Payne, Hering and Bruns-Gubeladze.
Automated explanatory feedback systems play a crucial role in facilitating learning for a large cohort of learners by offering feedback that incorporates explanations, significantly enhancing the learning process. However, delivering such explanatory feedback in real-time poses challenges, particularly when high classification accuracy for domain-specific, nuanced responses is essential. Our study leverages the capabilities of large language models, specifically Generative Pre-Trained Transformers (GPT), to explore a sequence labeling approach focused on identifying components of desired and less desired praise for providing explanatory feedback within a tutor training dataset. Our aim is to equip tutors with actionable, explanatory feedback during online training lessons. To investigate the potential of GPT models for providing the explanatory feedback, we employed two commonly-used approaches: prompting and fine-tuning. To quantify the quality of highlighted praise components identified by GPT models, we introduced a Modified Intersection over Union (M-IoU) score. Our findings demonstrate that: (1) the M-IoU score effectively correlates with human judgment in evaluating sequence