Continuous integration (CI) is a widely used practice in modern software engineering. Unfortunately, it is also an expensive practice - Google and Mozilla estimate their CI systems in millions of dollars. There are a number of techniques and tools designed to or having the potential to save the cost of CI or expand its benefit - reducing time to feedback. However, their benefits in some dimensions may also result in drawbacks in others. They may also be beneficial in other scenarios where they are not designed to help. In this paper, we perform the first exhaustive comparison of techniques to improve CI, evaluating 14 variants of 10 techniques using selection and prioritization strategies on build and test granularity. We evaluate their strengths and weaknesses with 10 different cost and time-tofeedback saving metrics on 100 real-world projects. We analyze the results of all techniques to understand the design decisions that helped different dimensions of benefit. We also synthesized those results to lay out a series of recommendations for the development of future research techniques to advance this area.
In this paper, we discuss interesting scenarios resulting from the interplay between the gravitino and the lightest right-handed (s)neutrino. We consider two gravitino mass regimes vastly separated, that is, $m_{3/2}=\mathcal{O}(100){\rm eV}$ and $m_{3/2}\simeq100{\rm GeV}$. For the former case, a significant amount of the entropy production in the cosmological history to dilute the gravitino relic abundance is unavoidable for consistency with the number of satellite galaxies in the Milky way. We will show that the right-handed (s)neutrino can play the role of the heavy particle whose late time decay provides such an additional radiation. For the later case, the gravitino of $m_{3/2}\simeq100{\rm GeV}$ may resolve the $S_{8}$ tension as a decaying dark matter. We will show how the lightest right-handed neutrino and its superpartner can help the gravitino decaying dark matter be equipped with a long enough life time and mass degeneracy with a massive decay product to address the $S_{8}$ tension.
A researcher found that using Anthropic’s Claude Opus 4。7, he could break into the website of Front Gate—used by every festival from Lollapalooza to Bonnaroo—and freely issue any ticket he chose
The way games dynamically convey information through feedback is critical to players' ability to perform, learn, and improve. However, it is poorly understood how performance metrics impact player performance and perception in core game tasks like pointing or steering. With a virtual reality pointing task we systematically explored how three performance metrics driving the feedback affected players when rewarding short completion times, straight movements, or high peak speed. across different points in time - continuously, at end-of-action, or at end-of-task. On average the dynamic feedback helped people point more straight and faster, while for others it had small or opposite effect. The study quantitatively compared dynamic feedback across three forms with the metrics driving the form as the intended locus of quantitative comparison. Our work improves game designers basis for crafting dynamic feedback by helping them know when to employ feedback schemes that align with desirable game performance objectives.
The complexity of navigating digital privacy, safety, and security threats often falls directly on users. This leads to users seeking help from family and peers, platforms and advice guides, dedicated communities, and even large language models (LLMs). As a precursor to improving resources across this ecosystem, our community needs to understand what help seeking looks like in the wild. To that end, we blend qualitative coding with LLM fine-tuning to sift through over one billion Reddit posts from the last four years to identify where and for what users seek digital privacy, safety, or security help. We isolate three million relevant posts with 93% precision and recall and automatically annotate each with the topics discussed (e.g., security tools, privacy configurations, scams, account compromise, content moderation, and more). We use this dataset to understand the scope and scale of help seeking, the communities that provide help, and the types of help sought. Our work informs the development of better resources for users (e.g., user guides or LLM help-giving agents) while underscoring the inherent challenges of supporting users through complex combinations of threats, platforms,
Teachers' in-the-moment support is a limited resource in technology-supported classrooms, and teachers must decide whom to help and when during ongoing student work. However, less is known about how students' prior help history (whether they were helped earlier) and their engagement states (e.g., idle, struggle) shape teachers' decisions, and whether observed learning benefits associated with teacher help extend beyond the current class session. To address these questions, we first conducted interviews with nine K-12 mathematics teachers to identify candidate decision factors for teacher help. We then analyzed 1.4 million student-system interactions from 339 students across 14 classes in the MATHia intelligent tutoring system by linking teacher-logged help events with fine-grained engagement states. Mixed-effects models show that students who received help earlier were more likely to receive additional help later, even after accounting for current engagement state. Cross-lagged panel analyses further show that teacher help recurred across sessions, whereas idle behavior did not receive sustained attention over time. Finally, help coincided with immediate learning within sessions, b
Entrepreneurship requires navigating open-ended, ill-defined problems: identifying risks, challenging assumptions, and making strategic decisions under deep uncertainty. Novice founders often struggle with these metacognitive demands, while mentors face limited time and visibility to provide tailored support. We present a human-AI coaching system that combines a domain-specific cognitive model of entrepreneurial risk with a large language model (LLM) to proactively scaffold both novice and mentor thinking. The system proactively poses diagnostic questions that challenge novices' thinking and helps both novices and mentors plan for more focused and emotionally attuned meetings. Critically, mentors can inspect and modify the underlying cognitive model, shaping the logic of the system to reflect their evolving needs. Through an exploratory field deployment, we found that using the system supported novice metacognition, helped mentors plan emotionally attuned strategies, and improved meeting depth, intentionality, and focus--while also surfaced key tensions around trust, misdiagnosis, and expectations of AI. We contribute design principles for proactive AI systems that scaffold metacog
Artificial intelligence (AI) is rapidly transforming healthcare, enabling fast development of tools like stress monitors, wellness trackers, and mental health chatbots. However, rapid and low-barrier development can introduce risks of bias, privacy violations, and unequal access, especially when systems ignore real-world contexts and diverse user needs. Many recent methods use AI to detect risks automatically, but this can reduce human engagement in understanding how harms arise and who they affect. We present a human-centered framework that generates user stories and supports multi-agent discussions to help people think creatively about potential benefits and harms before deployment. In a user study, participants who read stories recognized a broader range of harms, distributing their responses more evenly across all 17 harm types. In contrast, those who did not read stories focused primarily on privacy and well-being (79.1%). Our findings show that storytelling helped participants speculate about a broader range of harms and benefits and think more creatively about AI's impact on users.
Helpful-only models, that is, models that are trained to always follow user intent, are valuable for dangerous capability evaluations and other areas of AI R&D where refusals would be an obstacle. Little is known about the generalization properties of helpful-only training: helpful-only models refuse less than their harmless counterparts, but previous work has not studied other dimensions of their alignment. We study the shortcomings of existing helpful-only models. We find that some show emergent misalignment, others have residual refusal behaviors, and most show poor steerability, sycophancy, and incoherent character. We show that simple anti-refusal training can cause many of these issues. None of these problems are necessary consequences of helpful-only training, though: we show that synthetic document fine-tuning and adding character-related questions to SFT and RL can mitigate them.
Mental health disorders are a global crisis. While various datasets exist for detecting such disorders, there remains a critical gap in identifying individuals actively seeking help. This paper introduces a novel dataset, M-Help, specifically designed to detect help-seeking behavior on social media. The dataset goes beyond traditional labels by identifying not only help-seeking activity but also specific mental health disorders and their underlying causes, such as relationship challenges or financial stressors. AI models trained on M-Help can address three key tasks: identifying help-seekers, diagnosing mental health conditions, and uncovering the root causes of issues.
Community-based fact-checking is a promising approach to address misinformation on social media at scale. However, an understanding of what makes community-created fact-checks helpful to users is still in its infancy. In this paper, we analyze the determinants of the helpfulness of community-created fact-checks. For this purpose, we draw upon a unique dataset of real-world community-created fact-checks and helpfulness ratings from X's (formerly Twitter) Community Notes platform. Our empirical analysis implies that the key determinant of helpfulness in community-based fact-checking is whether users provide links to external sources to underpin their assertions. On average, the odds for community-created fact-checks to be perceived as helpful are 2.70 times higher if they provide links to external sources. Furthermore, we demonstrate that the helpfulness of community-created fact-checks varies depending on their level of political bias. Here, we find that community-created fact-checks linking to high-bias sources (of either political side) are perceived as significantly less helpful. This suggests that the rating mechanism on the Community Notes platform successfully penalizes one-si
Artificial intelligence (AI) assistants are increasingly embedded in workplace tools, raising the question of how initiative-taking shapes adoption. Prior work highlights trust and expectation mismatches as barriers, but the underlying psychological mechanisms remain unclear. Drawing on self-affirmation and social exchange theories, we theorize that unsolicited help elicits self-threat, thereby reducing willingness to accept help, likelihood of future use, and performance expectancy of AI. We report two vignette-based experiments (Study~1: $N=761$; Study~2: $N=571$, preregistered). Study~1 compared anticipatory and reactive help provided by an AI vs. a human, while Study~2 distinguished between \emph{offering} (suggesting help) and \emph{providing} (acting automatically). In Study 1, AI reactive help was more threatening than reactive human help. Across both studies, anticipatory help increased user's self-threat and reduced adoption outcomes. Our findings identify self-threat as a mechanism through which anticipatory help, a proactive AI feature, may backfire, and suggest design implications to be tested in interactive systems.
In pseudonymous online fora like Reddit, the benefits of self-disclosure are often apparent to users (e.g., I can vent about my in-laws to understanding strangers), but the privacy risks are more abstract (e.g., will my partner be able to tell that this is me?). Prior work has sought to develop natural language processing (NLP) tools that help users identify potentially risky self-disclosures in their text, but none have been designed for or evaluated with the users they hope to protect. Absent this assessment, these tools will be limited by the social-technical gap: users need assistive tools that help them make informed decisions, not paternalistic tools that tell them to avoid self-disclosure altogether. To bridge this gap, we conducted a study with N = 21 Reddit users; we had them use a state-of-the-art NLP disclosure detection model on two of their authored posts and asked them questions to understand if and how the model helped, where it fell short, and how it could be improved to help them make more informed decisions. Despite its imperfections, users responded positively to the model and highlighted its use as a tool that can help them catch mistakes, inform them of risks t
Fact-checking on major platforms, such as X, Meta, and TikTok, is shifting from expert-driven verification to a community-based setup, where users contribute explanatory notes to clarify why a post might be misleading. An important challenge here is determining whether an explanation is helpful for understanding real-world claims and the reasons why, which remains largely underexplored in prior research. In practice, most community notes remain unpublished due to slow community annotation, and the reasons for helpfulness lack clear definitions. To bridge these gaps, we introduce the task of predicting both the helpfulness of explanatory notes and the reason for this. We present COMMUNITYNOTES, a large-scale multilingual dataset of 104k posts with user-provided notes and helpfulness labels. We further propose a framework that automatically generates and improves reason definitions via automatic prompt optimization, and integrate them into prediction. Our experiments show that the optimized definitions can improve both helpfulness and reason prediction. Finally, we show that the helpfulness information is beneficial for existing fact-checking systems.
Help-seeking is a critical way for students to learn new concepts, acquire new skills, and get unstuck when problem-solving in their computing courses. The recent proliferation of generative AI tools, such as ChatGPT, offers students a new source of help that is always available on-demand. However, it is unclear how this new resource compares to existing help-seeking resources along dimensions of perceived quality, latency, and trustworthiness. In this paper, we investigate the help-seeking preferences and experiences of computing students now that generative AI tools are available to them. We collected survey data (n=47) and conducted interviews (n=8) with computing students. Our results suggest that although these models are being rapidly adopted, they have not yet fully eclipsed traditional help resources. The help-seeking resources that students rely on continue to vary depending on the task and other factors. Finally, we observed preliminary evidence about how help-seeking with generative AI is a skill that needs to be developed, with disproportionate benefits for those who are better able to harness the capabilities of LLMs. We discuss potential implications for integrating g
This project investigates factors that influence the perceived helpfulness of Amazon product reviews through machine learning techniques. After extensive feature analysis and correlation testing, we identified key metadata characteristics that serve as strong predictors of review helpfulness. While we initially explored natural language processing approaches using TextBlob for sentiment analysis, our final model focuses on metadata features that demonstrated more significant correlations, including the number of images per review, reviewer's historical helpful votes, and temporal aspects of the review. The data pipeline encompasses careful preprocessing and feature standardization steps to prepare the input for model training. Through systematic evaluation of different feature combinations, we discovered that metadata elements we choose using a threshold provide reliable signals when combined for predicting how helpful other Amazon users will find a review. This insight suggests that contextual and user-behavioral factors may be more indicative of review helpfulness than the linguistic content itself.
Blind and low-vision (BLV) people face many challenges when venturing into public environments, often wishing it were easier to get help from people nearby. Ironically, while many sighted individuals are willing to help, such interactions are infrequent. Asking for help is socially awkward for BLV people, and sighted people lack experience in helping BLV people. Through a mixed-ability research-through-design process, we explore four diverse approaches toward how assistive technology can serve as help supporters that collaborate with both BLV and sighted parties throughout the help process. These approaches span two phases: the connection phase (finding someone to help) and the collaboration phase (facilitating help after finding someone). Our findings from a 20-participant mixed-ability study reveal how help supporters can best facilitate connection, which types of information they should present during both phases, and more. We discuss design implications for future approaches to support face-to-face help.
Many quantum algorithms, to compute some property of a unitary $U$, require access not just to $U$, but to $cU$, the unitary with a control qubit. We show that having access to $cU$ does not help for a large class of quantum problems. For a quantum circuit which uses $cU$ and $cU^\dagger$ and outputs $|ψ(U)\rangle$, we show how to "decontrol" the circuit into one which uses only $U$ and $U^\dagger$ and outputs $|ψ(\varphi U)\rangle$ for a uniformly random phase $\varphi$, with a small amount of time and space overhead. When we only care about the output state up to a global phase on $U$, then the decontrolled circuit suffices. Stated differently, $cU$ is only helpful because it contains global phase information about $U$. A version of our procedure is described in an appendix of Sheridan, Maslov, and Mosca (arXiv:0810.3843). Our goal with this work is to popularize this result by generalizing it and investigating its implications, in order to counter negative results in the literature which might lead one to believe that decontrolling is not possible. As an application, we give a simple proof for the existence of unitary ensembles which are pseudorandom under access to $U$, $U^\dag
Locating objects for the visually impaired is a significant challenge and is something no one can get used to over time. However, this hinders their independence and could push them towards risky and dangerous scenarios. Hence, in the spirit of making the visually challenged more self-sufficient, we present SonoVision, a smart-phone application that helps them find everyday objects using sound cues through earphones/headphones. This simply means, if an object is on the right or left side of a user, the app makes a sinusoidal sound in a user's respective ear through ear/headphones. However, to indicate objects located directly in front, both the left and right earphones are rung simultaneously. These sound cues could easily help a visually impaired individual locate objects with the help of their smartphones and reduce the reliance on people in their surroundings, consequently making them more independent. This application is made with the flutter development platform and uses the Efficientdet-D2 model for object detection in the backend. We believe the app will significantly assist the visually impaired in a safe and user-friendly manner with its capacity to work completely offline
Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing, which converts semi-structured log messages into structured templates, is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. However, existing log parsers fail in real-world systems for three main reasons. First, traditional heuristics-based parsers require handcrafted features and domain knowledge, which are difficult to generalize at scale. Second, existing large language model-based parsers rely on periodic offline processing, limiting their effectiveness in real-time use cases. Third, existing online parsing algorithms are susceptible to log drift, where slight log changes create false positives that drown out real anomalies. To address these challenges, we propose HELP, a Hierarchical Embeddings-based Log Parser. HELP is the first online semantic-based parser to leverage LLMs for performant and cost-effective log parsing. We achieve this through a novel hierarchical embeddings module, which fine-tunes a text embedding model to cluster logs before parsing, reducing querying costs by multiple orders of magnitud