Reinforcement learning (RL) agents under partial observability often condition actions on internally accumulated information such as memory or inferred latent context. We formalise such information-conditioned interaction patterns as behavioural dependency: variation in action selection with respect to internal information under fixed observations. This induces a probe-relative notion of $ε$-behavioural equivalence and a within-policy behavioural distance that quantifies probe sensitivity. We establish three structural results. First, the set of policies exhibiting non-trivial behavioural dependency is not closed under convex aggregation. Second, behavioural distance contracts under convex combination. Third, we prove a sufficient local condition under which gradient ascent on a skewed mixture objective decreases behavioural distance when a dominant-mode gradient aligns with the direction of steepest contraction. Minimal bandit and partially observable gridworld experiments provide controlled witnesses of these mechanisms. In the examined settings, behavioural distance decreases under convex aggregation and under continued optimisation with skewed latent priors, and in these experi
Human decision-making is strongly influenced by cognitive biases, particularly under conditions of uncertainty and risk. While prior work has examined bias in single-step decisions with immediate outcomes and in human interaction with a single autonomous agent, comparatively little attention has been paid to decision-making under delayed outcomes involving multiple AI agents, where decisions at each step affect subsequent states. In this work, we study how delayed outcomes shape decision-making and responsibility attribution in a multi-agent human-AI task. Using a controlled game-based experiment, we analyze how participants adjust their behavior following positive and negative outcomes. We observe asymmetric responses to gains and losses, with stronger corrective adjustments after negative outcomes. Importantly, participants often fail to correctly identify the actions that caused failure and misattribute responsibility across AI agents, leading to systematic revisions of decisions that are weakly related to the underlying causes of poor performance. We refer to this phenomenon as a form of attribution bias, manifested as biased error attribution under delayed feedback. Our findin
Communities can sustainably manage shared resources (commons) through self-governance and cooperative norms, a central finding of Ostrom's theory of self-governance. However, real-world commons (e.g., fisheries, forests, and irrigation systems) are often governed under asymmetric power structures, where certain individuals or institutions possess disproportionate control over resource extraction and collective outcomes. As Large Language Models (LLMs) are increasingly explored as agents in synthetic governance simulations, understanding how LLM societies behave under asymmetric power structures is becoming increasingly important, yet existing evaluations largely ignore such asymmetries. We introduce Sovereignty over the Commons Simulation (SovSim), a generative multi-agent simulation framework that incorporates an agent with asymmetric power (boss or king) into a society of symmetric agents (workers or peasants), where all agents extract from a shared resource, collectively determining its sustainability over time. Across eleven state-of-the-art models, we find that introducing asymmetric power leads to severe breakdowns in cooperation and sustainability, with up to an 87.3% degrad
We introduce c-Lattice Aggregation, a fault-tolerant reconstruction problem for distributed verification under crash and Byzantine failures. In our setting, n asynchronous processes supervise a concurrent execution I: each process holds a local sample, and must collaboratively reconstruct I from partial, potentially overlapping observations. A protocol solves c-Lattice Aggregation if at least c correct processes output the complete execution I, while all correct outputs are comparable and bounded by I. This strengthens Lattice Agreement [Attiya, Herlihy and Rachman, 1995] and Byzantine Lattice Agreement [Di Luna et al., 2020; Zheng and Garg, 2020]. We parameterize inputs by a redundancy parameter x -- every element of I appears in at least x initial samples -- and establish tight feasibility thresholds. Under crash failures with at most t faulty processes, Lattice Aggregation is solvable if and only if x >= t + 1. Under Byzantine failures with t < n/3, c-Lattice Aggregation is solvable if and only if x >= 2t + c. All bounds are tight: we present matching algorithms based on SCD-broadcast [Imbs et al., 2018; Khanchandani and Wattenhofer, 2024] and indistinguishability-based
Motivated by the recent introduction and large-scale deployment of BBR congestion control algorithms, multiple studies have investigated the performance and fairness implications of this shift from loss-based to delay-based congestion control. Given the potential Internet-wide adoption of BBR, we must also consider its robustness in network and system scenarios. One such scenario is Cloud-based Virtual Machine (VM) networking - highly relevant in today's CDN-centric Internet. Interestingly, previous work has shown significant performance problems of BBRv1-2 running in Xen VMs, with BBR performance dropping to almost zero when CPU credit is low. In this paper, we develop a framework for measuring TCP throughput under fully controlled CPU contention, which uses Linux deadline scheduling to emulate generalized CPU contention conditions. Our measurements reveal that - in stark contrast to Cubic! - BBR throughput can break down during CPU contention under any hypervisor and all tested BDP conditions. Characterizing this performance degradation on a fine-granular level, we show that CPU limited BBR senders are capped at very low throughput levels below 10-20 Mbps. This finding implies th
Conformal prediction (CP) offers distribution-free marginal coverage guarantees under an exchangeability assumption, but these guarantees can fail if the data distribution shifts. We analyze the use of pseudo-calibration as a tool to counter this performance loss under a bounded label-conditional covariate shift model. Using tools from domain adaptation, we derive a lower bound on target coverage in terms of the source-domain loss of the classifier and a Wasserstein measure of the shift. Using this result, we provide a method to design pseudo-calibrated sets that inflate the conformal threshold by a slack parameter to keep target coverage above a prescribed level. Finally, we propose a source-tuned pseudo-calibration algorithm that interpolates between hard pseudo-labels and randomized labels as a function of classifier uncertainty. Numerical experiments show that our bounds qualitatively track pseudo-calibration behavior and that the source-tuned scheme mitigates coverage degradation under distribution shift while maintaining nontrivial prediction set sizes.
Malware classification models often suffer performance degradation under concept drift due to evolving threat landscapes and the emergence of novel malware families. This paper presents FARM (Few-shot Adaptive Recognition of Malware), a unified framework for detecting and adapting to both covariate drift and label drift in Windows Portable Executable (PE) malware family classification. FARM uses a triplet autoencoder to project samples into a discriminative latent space, enabling unsupervised drift detection through DBSCAN clustering and dynamic thresholding. To enable rapid adaptation, the framework employs a few-shot strategy that can incorporate new classes from only a small number of labeled samples. FARM also supports full retraining when sufficient drifted samples accumulate, allowing longer-term model updating. Experiments on the BenchMFC dataset show that FARM improves classification performance under covariate drift by 5.6%, and achieves an average F1 score of 0.85 on unseen malware families using few-shot adaptation, increasing to 0.94 after retraining. These results indicate that FARM provides an effective approach for drift-aware malware family classification in dynamic
Selective attention can momentarily alter visual appearance, but can such effects be learned? We tested whether training attention under sensory competition produces lasting changes in perceived contrast. Across seven days, participants trained on an orientation task with a fixed target location, with or without a salient distractor. Before and after training, we measured the point of subjective equality (PSE). Training under competition produced a reliable push-pull shift. Stimuli at the trained location appeared higher in contrast, whereas stimuli at the untrained location appeared lower. Conversely, training without distractors improved performance but did not alter appearance. Crucially, these opponent shifts were robust to task variations, persisting even in equality judgments designed to minimize response bias. Furthermore, the effect generalized to stimuli with novel orientation and contrast levels. These findings demonstrate that resolving sensory competition does not merely improve discrimination, but durably recalibrates the subjective appearance of the visual world.
We generalize the convergence results of an explosive autoregression, pioneered in Anderson (1959), in three ways: First, we demonstrate that the centered least-squares estimator converges geometrically to a ratio of limits, even in settings where the innovations are correlated and not centered around zero. Secondly, we demonstrate that the requirement of independent innovations in Anderson (1959), Theorem 2.3, can be relaxed to $α$-mixing. Third, we provide an autocorrelation-robust feasible test statistic for the explosive parameter under Gaussian ARMA innovations.
Polymer chain scission is a key mechanism for fracture of soft materials. It is well known from single-molecule force spectroscopy experiments that the critical condition for chain scission depends on the loading rate and other environmental effects (e.g., temperature and solvent). Common approaches to describing the kinetics of chain scission often assume force-controlled conditions, that is, when a polymer chain is stretched by a prescribed force. As a result of this assumption, chain scission is irreversible, excluding the possibility of healing. In many soft materials, however, self-healing has been observed after fracture, suggesting possibly reversible chain scission. Here, we show that reversible chain scission is possible under displacement-controlled conditions, that is, when a polymer chain is stretched with a prescribed end-to-end distance. We present a breakable freely-jointed chain model, assuming that a polymer chain breaks when one of its links breaks while the other links remain nearly rigid. At a prescribed end-to-end distance, the free energy of the chain has two local minima and a local maximum (the transition state), giving rise to energy barriers for chain scis
Agricultural weed detection on edge devices is subject to strict constraints on model capacity, computational resources, and real-time inference latency, which prevent performance improvements through model scaling or ensembling. This paper proposes Model-Driven Data Correction (MDDC), a data-centric framework that enhances detection performance by iteratively diagnosing and correcting data quality deficiencies. An automated error analysis procedure categorizes detection failures into four types: false negatives, false positives, class confusion, and localization errors. These error patterns are systematically addressed through a structured train-fix-retrain pipeline with version-controlled data management. Experimental results on multiple weed detection datasets demonstrate consistent improvements of 5-25 percent in mAP at 0.5 using a fixed lightweight detector (YOLOv8n), indicating that systematic data quality optimization can effectively alleviate performance bottlenecks under fixed model capacity constraints.
Peer incentivization (PI) is a popular multi-agent reinforcement learning approach where all agents can reward or penalize each other to achieve cooperation in social dilemmas. Despite their potential for scalable cooperation, current PI methods heavily depend on fixed incentive values that need to be appropriately chosen with respect to the environmental rewards and thus are highly sensitive to their changes. Therefore, they fail to maintain cooperation under changing rewards in the environment, e.g., caused by modified specifications, varying supply and demand, or sensory flaws - even when the conditions for mutual cooperation remain the same. In this paper, we propose Dynamic Reward Incentives for Variable Exchange (DRIVE), an adaptive PI approach to cooperation in social dilemmas with changing rewards. DRIVE agents reciprocally exchange reward differences to incentivize mutual cooperation in a completely decentralized way. We show how DRIVE achieves mutual cooperation in the general Prisoner's Dilemma and empirically evaluate DRIVE in more complex sequential social dilemmas with changing rewards, demonstrating its ability to achieve and maintain cooperation, in contrast to curr
We introduce Ergodic Deviation-Robust Equilibrium (EDRE), a dynamics-relative equilibrium concept for repeated finite games in which agents learn via entropic mirror descent (EMD). EDRE requires three properties to hold simultaneously for the same profile and learning run: (E1) the limit profile is an $\varepsilon$-Nash equilibrium at a product distribution; (E2) along the entire learning trajectory, every fixed coalition's cumulative aggregate (summed-unilateral) deviation gain is $\tilde{\mathcal{O}}(\sqrt{T})$ with high probability; and (E3) the limit profile is a fixed point of the EMD map, so that it is selected by the dynamics rather than merely certified as an equilibrium. We prove that the $\sqrt{T}$ deviation-regret rate is order-tight, establish existence in exact-potential games (via Nash's theorem, with a constructive proximal route under concavity) together with Lyapunov monotonicity of EMD (and pointwise convergence when the fixed-point set is a singleton), and extend the selection property to monotone polymatrix games through variational inequalities. Although a static EDRE coincides with an $\varepsilon$-Nash equilibrium, its content is dynamic: robust (positive-mea
Distributed denial-of-service (DDoS) attacks threaten the availability of Internet of Things (IoT) infrastructures, particularly under resource-constrained deployment conditions. Although transfer learning models have shown promising detection accuracy, their reliability, computational feasibility, and interpretability in operational environments remain insufficiently explored. This study presents an explainability-aware empirical evaluation of seven pre-trained convolutional neural network architectures for multi-class IoT DDoS detection using the CICDDoS2019 dataset and an image-based traffic representation. The analysis integrates performance metrics, reliability-oriented statistics (MCC, Youden Index, confidence intervals), latency and training cost assessment, and interpretability evaluation using Grad-CAM and SHAP. Results indicate that DenseNet and MobileNet-based architectures achieve strong detection performance while demonstrating superior reliability and compact, class-consistent attribution patterns. DenseNet169 offers the strongest reliability and interpretability alignment, whereas MobileNetV3 provides an effective latency-accuracy trade-off for fog-level deployment.
MLLM-as-a-Judge is conventionally validated by agreement with human annotations, but this metric is undefined when the human pool is culturally heterogeneous. We introduce VOIR DIRE, a multimodal benchmark of 626 culturally paired image--prompt artifacts spanning U.S. and mainland Chinese contexts across food, fashion, and architecture, with annotator pools that are within-pool reliable (a = 0.86/0.74) but cross-pool divergent on evaluation (Q1 r = -0.12). Across six MLLMs, the bias decomposes into two failures: a positivity-floor calibration failure (compressed scale use) and an orientation failure (default to one cultural norm). On this corpus, where contested items are sampled to split the two pools, the floor mechanically validates the more-permissive Chinese reading; persona prompting partially recovers calibration, but the orientation residual survives, evidence the tilt is not reducible to scale compression. Reference-pool in-context demonstrations deepen the orientation residual and inflate the high end rather than restoring use of the low end. Model origin adds a small additive tilt (~0.10 MAE) that is approximately invariant under demonstration. We recommend reporting ali
Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at local semantic consistency, their autoregressive nature results in a specific kind of structural fragility compared to human writing. We propose Luminol-AIDetect, a novel, zero-shot statistical approach that exposes this fragility through coherence disruption. By applying a simple randomized text-shuffling procedure, we demonstrate that the resulting shift in perplexity serves as a principled, model-agnostic discriminant, as MGT displays a characteristic dispersion in perplexity-under-shuffling that differs markedly from the more stable structural variability of human-written text. Luminol-AIDetect leverages this distinction to inform its decision process, where a handful of perplexity-based scalar features are extracted from an input text and its shuffled version, then detection is performed via density estimation and ensemble-based prediction. Evaluated across 8 content domains, 11 adversarial attack types, and 18 languages, Luminol-AIDetect
Climate-economic modeling under uncertainty presents significant computational challenges that may limit policymakers' ability to address climate change effectively. This paper explores neural network-based approaches for solving high-dimensional optimal control problems arising from models that incorporate ambiguity aversion in climate mitigation decisions. We develop a continuous-time endogenous-growth economic model that accounts for multiple mitigation pathways, including emission-free capital and carbon intensity reductions. Given the inherent complexity and high dimensionality of these models, traditional numerical methods become computationally intractable. We benchmark several neural network architectures against finite-difference generated solutions, evaluating their ability to capture the dynamic interactions between uncertainty, technology transitions, and optimal climate policy. Our findings demonstrate that appropriate neural architecture selection significantly impacts both solution accuracy and computational efficiency when modeling climate-economic systems under uncertainty. These methodological advances enable more sophisticated modeling of climate policy decisions
Davydova-Lashkin-Fokas-Lenells equation (DLFLE) is a gauged equivalent form of Fokas-Lenells equation (FLE) that addresses both spatio-temporal dispersion (STD) and nonlinear dispersion (ND) effects. The balance between those effects results a soliton which has always been an interesting topic in research due to its potential applicability as signal carrier in information technology. We have induced a variation to the dispersion effects and apply Hirota bilinear method to realise soliton solution of the proposed DLFLE and explore how the soliton dynamic behaves in accordance to the variation of the dispersion effects. The proposed equation is applicable for number of systems like ultrashort optical pulse, ioncyclotron plasma wave, Bose-Einstein condensate (BEC) matter-wave soliton under certain external fields, etc. The study on such systems under varying effects is very limited and we hope our work can benefit the researchers to understand soliton dynamics more and work on various other nonlinear fields under varying effects.
Large Language Models (LLMs) have shown remarkable progress in multiple-choice question answering (MCQA), but their inherent unreliability, such as hallucination and overconfidence, limits their application in high-risk domains. To address this, we propose a frequency-based uncertainty quantification method under black-box settings, leveraging conformal prediction (CP) to ensure provable coverage guarantees. Our approach involves multiple independent samplings of the model's output distribution for each input, with the most frequent sample serving as a reference to calculate predictive entropy (PE). Experimental evaluations across six LLMs and four datasets (MedMCQA, MedQA, MMLU, MMLU-Pro) demonstrate that frequency-based PE outperforms logit-based PE in distinguishing between correct and incorrect predictions, as measured by AUROC. Furthermore, the method effectively controls the empirical miscoverage rate under user-specified risk levels, validating that sampling frequency can serve as a viable substitute for logit-based probabilities in black-box scenarios. This work provides a distribution-free model-agnostic framework for reliable uncertainty quantification in MCQA with guaran
Inferring causal relationships between variable pairs in the observational study is crucial but challenging, due to the presence of unmeasured confounding. While previous methods employed the negative controls to adjust for the confounding bias, they were either restricted to the discrete setting (i.e., all variables are discrete) or relied on strong assumptions for identification. To address these problems, we develop a general nonparametric approach that accommodates both discrete and continuous settings for testing causal hypothesis under unmeasured confounders. By using only a single negative control outcome (NCO), we establish a new identification result based on a newly proposed integral equation that links the outcome and NCO, requiring only the completeness and mild regularity conditions. We then propose a kernel-based testing procedure that is more efficient than existing moment-restriction methods. We derive the asymptotic level and power properties for our tests. Furthermore, we examine cases where our procedure using only NCO fails to achieve identification, and introduce a new procedure that incorporates a negative control exposure (NCE) to restore identifiability. We