共找到 20 条结果
We present a weight similarity measure method that can quantify the weight similarity of non-convex neural networks. To understand the weight similarity of different trained models, we propose to extract the feature representation from the weights of neural networks. We first normalize the weights of neural networks by introducing a chain normalization rule, which is used for weight representation learning and weight similarity measure. We extend the traditional hypothesis-testing method to a hypothesis-training-testing statistical inference method to validate the hypothesis on the weight similarity of neural networks. With the chain normalization rule and the new statistical inference, we study the weight similarity measure on Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN), and find that the weights of an identical neural network optimized with the Stochastic Gradient Descent (SGD) algorithm converge to a similar local solution in a metric space. The weight similarity measure provides more insight into the local solutions of neural networks. Experiments on several datasets consistently validate the hypothesis of weight similari
In this paper, we establish a weighted capillary isoperimetric inequality outside convex sets using the $λ_w$-ABP method. The weight function $w$ is assumed to be positive, even, and homogeneous of degree $α$, such that $w^{1/α}$ is concave on $\R^n$. Based on the weighted isoperimetric inequality, we develop a technique of capillary Schwarz symmetrization outside convex sets, and establish a weighted Pólya-Szegö principle and a sharp weighted capillary Sobolev inequality outside convex domain. Our result can be seen as an extension of the weighted Sobolev inequality in the half-space established by Ciraolo-Figalli-Roncoroni in \cite{CFR}.
Let $\mathfrak{g}=\mathfrak{g}(A)$ be any Borcherds-Kac-Moody $\mathbb{C}$-Lie algebra (BKM LA) for BKM-Cartan matrix $A$, with Cartan subalgebra $\mathfrak{h}$. Let $V$ denote a highest weight $\mathfrak{g}$-module, with top weight $λ\in \mathfrak{h}^*$ (not necessarily in the domninant integral cone $P^+$). The non-integrable simples $V= L(λ)$ by Naito ([Trans. Amer. Soc., 1995]) are widely studied beyond integrable simple $L(ν)s,\ ν\in P^+$. We introduce and study: 1) A weight cone $P^{\pm}=\big\{μ\in \mathfrak{h}^*\ \big|\ μ(α_i^{\vee})\in \frac{A_{ii}}{2}\mathbb{Z}_{\geq 0}\text{ for all simple co-roots }α_i^{\vee}\big\}$; note Weyl vector $ρ\in P^{\pm}\setminus P^+$. 2) The resulting (novel) non-integrable simple $L(λ)s, \ λ\in P^{\pm}\setminus P^{+}$; their Chevalley-Serre (CS) type relations (which are, in fact, complementary to those of integrable $L(ν)$s); 3) Higher length CS type relations in any highest weight module under the name ``holes". Using these, we obtain explicitly and uniformly, (notably) Weyl-orbit typed formulas for weight-sets of: all simples $L(λ)$s ($\forall$ $λ\in \mathfrak{h}^*$) and all quotients of parabolic Verma modules along imaginary directions.
In the real open world, data tends to follow long-tailed class distributions, motivating the well-studied long-tailed recognition (LTR) problem. Naive training produces models that are biased toward common classes in terms of higher accuracy. The key to addressing LTR is to balance various aspects including data distribution, training losses, and gradients in learning. We explore an orthogonal direction, weight balancing, motivated by the empirical observation that the naively trained classifier has "artificially" larger weights in norm for common classes (because there exists abundant data to train them, unlike the rare classes). We investigate three techniques to balance weights, L2-normalization, weight decay, and MaxNorm. We first point out that L2-normalization "perfectly" balances per-class weights to be unit norm, but such a hard constraint might prevent classes from learning better classifiers. In contrast, weight decay penalizes larger weights more heavily and so learns small balanced weights; the MaxNorm constraint encourages growing small weights within a norm ball but caps all the weights by the radius. Our extensive study shows that both help learn balanced weights and
Weight-space model merging is usually formulated as an algebraic operation on checkpoints, yet at LLM scale the limiting resource is often the set of expert weights that must be read. We introduce MergePipe, a budget-aware execution layer that casts LLM merging as an \emph{expert access-set} problem: given a merge operator and a checkpoint family in a shared weight coordinate system, choose which expert delta blocks to access under an explicit I/O budget. MergePipe indexes parameter blocks, builds deterministic access plans, and executes the induced budgeted merge with replayable manifests. The plan is budget-sound by construction and recovers the full-read merge at full budget; for fixed-coefficient additive operators, the omitted-update error is bounded by the norm of omitted deltas. Across Qwen and Llama merging workloads, MergePipe reduces expert-read I/O by up to an order of magnitude and achieves up to $11\times$ speedups. Representative budget sweeps show $O(10^{-3})$ parameter deviation from full-read merges and no monotonic degradation on downstream benchmarks.
This work explores the effect of object weight on human motion and grip release during handovers to enhance the naturalness, safety, and efficiency of robot-human interactions. We introduce adaptive robotic strategies based on the analysis of human handover behavior with varying object weights. The key contributions of this work includes the development of an adaptive grip-release strategy for robots, a detailed analysis of how object weight influences human motion to guide robotic motion adaptations, and the creation of handover-datasets incorporating various object weights, including the YCB handover dataset. By aligning robotic grip release and motion with human behavior, this work aims to improve robot-human handovers for different weighted objects. We also evaluate these human-inspired adaptive robotic strategies in robot-to-human handovers to assess their effectiveness and performance and demonstrate that they outperform the baseline approaches in terms of naturalness, efficiency, and user perception.
This paper proves a conditional structural uniqueness theorem for induced weight on robust record sectors within an admissible Hilbert record layer. Its theorem target and additive carrier differ from those of the standard Born-rule routes: additivity is not placed on the full projector lattice, but on disjoint admissible continuation bundles through an extensive bundle valuation, from which the sector-level additive law is inherited under admissible refinement. Accordingly, the result is not a Gleason-type representation theorem in different language, but a distinct uniqueness theorem about induced sector weight inherited from bundle additivity on admissible continuation structure. Under two explicit structural conditions, internal equivalence of admissible binary refinement profiles and sufficient admissible refinement richness, the quadratic assignment is the only non-negative refinement-stable induced weight on robust record sectors. In the main theorem, refinement richness is secured by admissible binary saturation. A supplementary proposition shows that dense admissible saturation already suffices if continuity of the profile function is added. Under normalization, the result
Fix any complex Kac-Moody Lie algebra $\mathfrak{g}$, and Cartan subalgebra $\mathfrak{h}\subset \mathfrak{g}$. We study arbitrary highest weight $\mathfrak{g}$-modules $V$ (with any highest weight $λ\in \mathfrak{h}^*$, and let $L(λ)$ be the corresponding simple highest weight $\mathfrak{g}$-module), and write their weight-sets $\mathrm{wt} V$. This is based on and generalizes the Minkowski decompositions for all $\mathrm{wt} L(λ)$ and hulls $\mathrm{conv}_{\mathbb{R}}(\mathrm{wt} V)$, of Khare [J. Algebra. 2016 & Trans. Amer. Math. Soc. 2017] and Dhillon-Khare [Adv. Math. 2017 & J. Algebra. 2022]. Those works need a freeness property of the Dynkin graph nodes of integrability $J_λ$ of $L(λ)$: $\mathrm{wt} L(λ)\ -$ any sum of simple roots over $J_λ^c$ are all weights of $L(λ)$. We generalize it for all $V$, by introducing nodes $J_V$ that record all the lost 1-dim. weights in $V$. We show three applications (seemingly novel) for all $\big(\mathfrak{g}, λ, V\big)$ of our $J_V^c$-freeness: 1) Minkowski decompositions of all $\mathrm{wt} V$, subsuming those above for simples. 1$'$) Characterization of these formulas. 1$''$) For these, we solve the inverse problem of determini
Finetuning (pretrained) language models is a standard approach for updating their internal parametric knowledge and specializing them to new tasks and domains. However, the corresponding model weight changes ("weight diffs") are not generally interpretable. While inspecting the finetuning dataset can give a sense of how the model might have changed, these datasets are often not publicly available or are too large to work with directly. Towards the goal of comprehensively understanding weight diffs in natural language, we introduce Diff Interpretation Tuning (DIT), a method that trains models to describe their own finetuning-induced modifications. Our approach uses synthetic, labeled weight diffs to train a DIT-adapter, which can be applied to a compatible finetuned model to make it describe how it has changed. We demonstrate in two proof-of-concept settings (reporting hidden behaviors and summarizing finetuned knowledge) that our method enables models to describe their finetuning-induced modifications using accurate natural language descriptions.
Recurrent networks can contain substantial functional redundancy in weight space: changing a recurrent matrix may leave the input-output rollout nearly unchanged on a task distribution, while similar-scale changes can destroy the same behavior. We study this redundancy in one-layer tanh RNNs using ordered real Schur coordinates. The Schur form separates spectral blocks from directed nonnormal couplings, giving a diagnostic basis for structured ablations that keep the input and readout maps fixed. In a fixed-length copy task, selected nonnormal Schur couplings can be removed with little loss in some trained solutions, whereas other couplings are necessary for accurate autonomous replay. Across flip-flop, sine generation, and context-dependent integration, the loss-preserving ablation profile varies across tasks and trained solutions. These results identify candidate approximate functional invariances, not universal symmetries of recurrent weight space. Schur-coordinate ablations provide a practical diagnostic for which structured perturbations preserve a trained recurrent solution and which ones disrupt its computation.
Many failures in deep continual and reinforcement learning are associated with increasing magnitudes of the weights, making them hard to change and potentially causing overfitting. While many methods address these learning failures, they often change the optimizer or the architecture, a complexity that hinders widespread adoption in various systems. In this paper, we focus on learning failures that are associated with increasing weight norm and we propose a simple technique that can be easily added on top of existing learning systems: clipping neural network weights to limit them to a specific range. We study the effectiveness of weight clipping in a series of supervised and reinforcement learning experiments. Our empirical results highlight the benefits of weight clipping for generalization, addressing loss of plasticity and policy collapse, and facilitating learning with a large replay ratio.
Learning the optimal policy from a random network initialization is the theme of deep Reinforcement Learning (RL). As the scale of DRL training increases, treating DRL policy network weights as a new data modality and exploring the potential becomes appealing and possible. In this work, we focus on the policy learning path in deep RL, represented by the trajectory of network weights of historical policies, which reflects the evolvement of the policy learning process. Taking the idea of trajectory modeling with Transformer, we propose Transformer as Implicit Policy Learner (TIPL), which processes policy network weights in an autoregressive manner. We collect the policy learning path data by running independent RL training trials, with which we then train our TIPL model. In the experiments, we demonstrate that TIPL is able to fit the implicit dynamics of policy learning and perform the optimization of policy network by inference.
In our previous publication [{\em Calc. Var. Partial Differential Equations}, 60(1):Paper No. 16, 27, 2021], we delved into examining a critical Sobolev-type embedding of a Sobolev weighted space into an exponential weighted Orlicz space. We specifically determined the optimal Moser-type constant for this embedding, utilizing the monomial weight introduced by Cabré and Ros-Oton [{\em J. Differential Equations}, 255(11):4312--4336, 2013]. Towards the conclusion of that paper, we pledged to explore the existence of an extremal function within this framework. In this current work, we not only provide a positive affirmation to this inquiry but extend it to a broader range of weights known as \emph{$α$-homogeneous weights}.
It has been known since the 1970's that the difference of the non-zero weights of a projective $\mathbb{F}_q$-linear two-weight has to be a power of the characteristic of the underlying field. Here we study non-projective two-weight codes and e.g.\ show the same result under mild extra conditions. For small dimensions we give exhaustive enumerations of the feasible parameters in the binary case.
Gel spinning is the industrial method of choice for combining hydrophilic ultra-high molecular weight (UHMW) polymer resins with a hydrophobic support polymer to produce composite filaments for cytapheresis. Cytapheresis is a medical technique for removal of leukocytes from blood. Gel spinning is used to avoid high melt viscosity and thermal sensitivity of UHMW resins and the high melt temperature of the substrate resin but requires the recovery of toxic solvents. The UHMW resin is used because it forms a stable gel phase in the presence of water; a lower molecular weight resin (LMW) simply dissolves. UHMW and LMW resins were both poly(ethylene oxide) (PEO) and the substrate was polyarylsulfone (PAS). The literature indicated PEO undergoes non-oxidative thermal degradation above 200 °C and PAS is processed up to 350 °C. Dynamic oscillatory shear rheometry was used to study 0, 25, 40, 50, 60, and 75 wt. % UHMW PEO in LMW PEO to take advantage of the sensitivity of viscosity to changes in molecular weight and material configuration, indicating degradation. Samples were exposed to 220 °C, 230 °C, 240 °C, 250 °C, 275 °C, and 300 °C temperatures for 5 min to explore conditions that coul
Highest weight categories are an abstraction of the representation theory of semisimple Lie algebras introduced by Cline, Parshall and Scott in the late 1980s. There are by now many characterisations of when an abelian category is highest weight, but most are hard to verify in practice. We present two new criteria - one numerical in terms of the Grothendieck group, and one in terms of Bridegland stability conditions - which are easier to verify. The stability criterion naturally generalises to a characterisation of properly stratified categories. The numerical criterion implies a criterion of Green and Schroll for when modules over a monomial algebra are highest weight.
We explore the symmetry of the mean k x k weight kernel in each layer of various convolutional neural networks. Unlike individual neurons, the mean kernels in internal layers tend to be symmetric about their centers instead of favoring specific directions. We investigate why this symmetry emerges in various datasets and models, and how it is impacted by certain architectural choices. We show how symmetry correlates with desirable properties such as shift and flip consistency, and might constitute an inherent inductive bias in convolutional neural networks.
We prove that the weight 6, depth 3, multiple polylogarithm $ \mathrm{Li}_{4,1,1}((xyz)^{-1}, x, y) $, or rather its more natural `divergent' incarnation $ \mathrm{Li}_{3;1,1,1}(x,y,z) $, satisfies the 6-fold anharmonic symmetries of the dilogarithm $ \mathrm{Li}_2 $, $ λ\mapsto 1-λ$ and $ λ\mapsto λ^{-1} $, in each of $x$, $y$ and $z$ independently, modulo terms of depth $ \leq2 $. This establishes the `higher Zagier' part of the weight 6, depth 3, reduction conjectured by Matveiakin and Rudenko. Together with their proof of the `higher Gangl' part of the weight 6, depth 3, reduction (which is formulated modulo the `higher Zagier' part), we establish Goncharov's Depth Conjecture in the case of weight 6, depth 3.
This paper is dedicated to the study of weight complexes (defined on triangulated categories endowed with weight structures) and their applications. We introduce pure (co)homological functors that "ignore all non-zero weights"; these have a nice description in terms of weight complexes. For the weight structure $w^G$ generated by the orbit category in the $G$-equivariant stable homotopy category $SH(G)$ the corresponding pure cohomological functors into abelian groups are the Bredon cohomology associated to Mackey functors ones; pure functors related to motivic weight structures are also quite useful. Our results also give some (more) new weight structures. Moreover, we prove that certain exact functors are conservative and "detect weights".
Let $X$ be a compact Kähler manifold. In this paper we study the existence of constant weighted scalar curvature Kähler (weighted cscK) metrics on $X$. More precisely, we establish a priori $C^{k}$-estimates ($k\geq 0$) for the Kähler potential associated with these metrics, thereby extending a result due to Chen and Cheng in the classical cscK setting.